Re: [PERFORM] mmap (was First set of OSDL Shared Mem scalability results, some wierdness ...

2004-10-15 Thread Sean Chittenden
 changes.  It does change the location 
of the cache, from being in the userland, down in to the kernel.  This 
is a change for database administrators, but a good one, IMHO.  
Previously, the operating system would be split 25% kernel, 75% user 
because PostgreSQL would need the available RAM for its cache.  Now, 
that can be moved closer to the opposite, 75% kernel, 25% user because 
most of the memory is mmap(2)'ed pages instead of actual memory in the 
userland.

*) Pages can be protected via PROT_(EXEC|READ|WRITE).  For backends 
that aren't making changes to the DDL or system catalogs (permissions, 
etc.), pages that are loaded from the catalogs could be loaded with the 
protection PROT_READ, which would prevent changes to the catalogs.  All 
DDL and permission altering commands (anything that touches the system 
catalogs) would then load the page with the PROT_WRITE bit set, make 
their changes, then PROT_READ the page again.  This would provide a 
first line of defense against buggy programs or exploits.

*) Eliminates the double caching done currently (caching in PostgreSQL 
and the kernel) by pushing the cache into the kernel... but without 
PostgreSQL knowing it's working on a page that's in the kernel.

Please ask questions if you have them.
-sc
--
Sean Chittenden
---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] First set of OSDL Shared Mem scalability results, some wierdness ...

2004-10-15 Thread Sean Chittenden
this.  The SUS text is a bit weaselly ("the application must ensure
correct synchronization") but the HPUX mmap man page, among others,
lays it on the line:
 It is also unspecified whether write references to a memory region
 mapped with MAP_SHARED are visible to processes reading the file 
and
 whether writes to a file are visible to processes that have 
mapped the
 modified portion of that file, except for the effect of msync().

It might work on particular OSes but I think depending on such behavior
would be folly...
Agreed.  Only OSes with a coherent file system buffer cache should ever 
use mmap(2).  In order for this to work on HPUX, msync(2) would need to 
be used.  -sc

--
Sean Chittenden
---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [PERFORM] Anything to be gained from a 'Postgres Filesystem'?

2004-10-21 Thread Sean Chittenden
As someone else noted, this doesn't belong in the filesystem (rather 
the kernel's block I/O layer/buffer cache). But I agree, an API by 
which we can tell the kernel what kind of I/O behavior to expect would 
be good.
[snip]
The closest API to what you're describing that I'm aware of is 
posix_fadvise(). While that is technically-speaking a POSIX standard, 
it is not widely implemented (I know Linux 2.6 implements it; based on 
some quick googling, it looks like AIX does too).
Don't forget about the existence/usefulness/widely implemented 
madvise(2)/posix_madvise(2) call, which can give the OS the following 
hints: MADV_NORMAL, MADV_SEQUENTIAL, MADV_RANDOM, MADV_WILLNEED, 
MADV_DONTNEED, and MADV_FREE.  :)  -sc

--
Sean Chittenden
---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [PERFORM] mmap (was First set of OSDL Shared Mem scalability results, some wierdness ...

2004-10-21 Thread Sean Chittenden
lected.
What performance issues do you think shared memory needs to have fixed?
We don't issue any shmem kernel calls after the initial shmget, so
comparing the level of kernel tenseness about shmget to the level of
tenseness about mmap is simply irrelevant.  Perhaps the reason you 
don't
see any traffic about this on the kernel lists is that shared memory
already works fine and doesn't need any fixing.
I'm gunna get flamed for this, but I think its improperly used as a 
second level cache on top of the operating system's cache.  mmap(2) 
would consolidate all caching into the kernel.

Please ask questions if you have them.
Do you have any arguments that are actually convincing?
Three things come to mind.
1) A single cache for pages
2) Ability to give access hints to the kernel regarding future IO
3) On the fly memory use for a cache.  There would be no need to 
preallocate slabs of shared memory on startup.

And a more minor point would be:
4) Not having shared pages get lost when the backend dies (mmap(2) uses 
refcounts and cleans itself up, no need for ipcs/ipcrm/ipcclean).  This 
isn't too practical in production though, but it sucks doing PostgreSQL 
development on OS-X because there is no ipcs/ipcrm command.

What I just read was a proposal to essentially throw away not only the 
entire
low-level data access model, but the entire low-level locking model,
and start from scratch.
From the above list, steps 2, 3, 5, 6, and 7 would be different than 
our current approach, all of which could be safely handled with some 
#ifdef's on platforms that don't have mmap(2).

There is no possible way we could support both
this approach and the current one, which means that we'd be permanently
dropping support for all platforms without high-quality mmap
implementations;
Architecturally, I don't see anything different or incompatibilities 
that aren't solved with an #ifdef USE_MMAP/#else/#endif.

Furthermore, you didn't
give any really convincing reasons to think that the enormous effort
involved would be repaid.
Steven's has a great reimplementaion of cat(1) that uses mmap(1) and 
benchmarks the two.  I did my own version of that here:

http://people.freebsd.org/~seanc/mmap_test/
When read(2)'ing/write(2)'ing /etc/services 100,000 times without 
mmap(2), it takes 82 seconds.  With mmap(2), it takes anywhere from 1.1 
to 18 seconds.  Worst case scenario with mmap(2) yields a speedup by a 
factor of four.  Best case scenario...  *shrug* something better than 
4x.  I doubt PostgreSQL would see 4x speedups in the IO department, but 
I do think it would be vastly greater than the 3% suggested.

Those oprofile reports Josh just put up
showed 3% of the CPU time going into userspace/kernelspace copying.
Even assuming that that number consists entirely of reads and writes of
shared buffers (and of course no other kernel call ever transfers any
data across that boundary ;-)), there's no way we are going to buy into
this sort of project in hopes of a 3% win.
Would it be helpful if I created a test program that demonstrated that 
the execution path for writing mmap(2)'ed pages as outlined above?

-sc
--
Sean Chittenden
---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Need advice on postgresql.conf settings

2004-11-09 Thread Sean Chittenden
The real issue is this, we have THE SAME queries taking anywhere from 
.001 -
90.0 seconds... the server is using 98% of the available RAM at all 
times
(because of the persistant connections via php), and I don't know 
what to
do.
Another possible line of attack is to use persistent (pooled)
connections to cut down the number of live backend processes you need.
However, depending on what your application software is, that might
take more time/effort (= money) than dropping in some more RAM.
This particular feature is pure evilness.  Using all of my fingers and 
toes, I can't count the number of times I've had a client do this and 
get themselves into a world of hurt.  Somewhere in the PHP 
documentation, there should be a big warning wrapped in the blink tag 
that steers people away from setting this.  The extra time necessary to 
setup a TCP connection is less than the performance drag induced on the 
backend when persistent connections are enabled.  Reread that last 
sentence until it sinks in.  On a local network, this is premature 
optimization that's hurting you.

max_files_per_process = 3052# min 25
You really have your kernel set to support 3052 * 75 simultaneously 
open
files?  Back this off.  I doubt values beyond a couple hundred buy
anything except headaches.
This, on the other hand, has made a large difference for me.  Time 
necessary to complete open(2) calls can be expensive, especially when 
the database is poorly designed and is touching many different parts of 
the database spread across multiple files on the backend.  3000 is 
high, but I've found 500 to be vastly too low in some cases... in 
others, it's just fine.  My rule of thumb has become, if you're doing 
lots of aggregate functions (ex, SUM(), COUNT()) more than once in the 
lifetime of a backend, increasing this value helps.. otherwise it buys 
you little (if so, 1500 is generally sufficient).  Faster IO, however, 
is going to save you here.  If you can, increase your disk caching in 
the OS.  On FreeBSD, increase your KVA_PAGES and NBUFs.  Since you've 
freed up more ram by disabling persistent connections, this shouldn't 
be a problem.  -sc

--
Sean Chittenden
---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] memcached and PostgreSQL

2004-11-18 Thread Sean Chittenden
So What does memcached offer pgsql users? It would still seem to offer
the benefit of a multi-machined cache.
Ack, I totally missed this thread.  Sorry for jumping in late.
Basically, memcached and pgmemcache offer a more technically correct 
way of implementing query caching.  MySQL's query caching is a 
disaster, IMHO.  memcached alleviates this load from the database and 
puts it elsewhere in a more optimized form.  The problem with memcached 
by itself is that you're relying on the application to invalidate the 
cache.  How many different places have to be kept in sync?  Using 
memcached, in its current form, makes relying on the application to be 
developed correctly with centralized libraries and database access 
routines.  Bah, that's a cluster f#$@ waiting to happen.

pgmemcache fixes that though so that you don't have to worry about 
invalidating the cache in every application/routine.  Instead you just 
centralize that logic in the database and automatically invalidate via 
triggers.  It's working out very well for me.

I'd be interested in success stories, fwiw.  In the next week or so 
I'll probably stick this on pgfoundry and build a proper make/release 
structure.  -sc

--
Sean Chittenden
---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [PERFORM] memcached and PostgreSQL

2004-11-21 Thread Sean Chittenden
The big concern I have about memcache is that because it controls
storage external to the database there is no way to guarantee the cache
is consistent with the database.
I've found that letting applications add data to memcache and then 
letting the database replace or delete keys seems to be the best 
approach to minimize exactly this issue.  Having two clients update the 
cache is risky.  Using triggers or using NOTIFY + tailing logs makes 
this much more bullet proof.

This is similar to sending email in a trigger or on commit where you 
can't be certain you send email always
and only on a commit.
While this is certainly a possibility, it's definitely closer to the 
exception and not the normal instance.

In the database, we mark everything we do with a transaction id and 
mark
the transaction id as committed in on operation.  I see no way to do
that with memcache.
Correct.  With an ON COMMIT trigger, it'll be easier to have a 100% 
accurate cache.  That said, memcache does exist out side of the 
database so it's theoretically impossible to guarantee that the two are 
100% in sync.  pgmemcache goes a long way towards facilitating that the 
cache is in sync with the database, but it certainly doesn't guarantee 
it's in sync.  That being said, I haven't had any instances of it not 
being in sync since using pgmemcache (I'm quite proud of this, to be 
honest *grin*).  For critical operations such as financial 
transactions, however, I advise going to the database unless you're 
willing to swallow the financial cost of cache discrepancies.

-sc
--
Sean Chittenden
---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] memcached and PostgreSQL

2004-11-23 Thread Sean Chittenden
My point was that there are two failure cases --- one where the cache 
is
slightly out of date compared to the db server --- these are cases 
where
the cache update is slightly before/after the commit.
I was thinking about this and ways to minimize this even further.  Have 
memcache clients add data and have a policy to have the database only 
delete data.  This sets the database up as the bottleneck again, but 
then you have a degree of transactionality that couldn't be previously 
achieved with the database issuing replace commands.  For example:

1) client checks the cache for data and gets a cache lookup failure
2) client beings transaction
3) client SELECTs data from the database
4) client adds the key to the cache
5) client commits transaction
This assumes that the client won't rollback or have a transaction 
failure.  Again, in 50M transactions, I doubt one of them would fail 
(sure, it's possible, but that's a symptom of bigger problems: 
memcached isn't an RDBMS).

The update case being:
1) client begins transaction
2) client updates data
3) database deletes record from memcache
4) client commits transaction
5) client adds data to memcache
The second is
where the cache update happens and the commit later fails, or the 
commit
happens and the cache update never happens.
Having pgmemcache delete, not replace data addresses this second issue. 
 -sc

--
Sean Chittenden
---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [PERFORM] Query planner plans very inefficient plans

2003-06-30 Thread Sean Chittenden
>I have somewhere around 3M rows in the image table, and 37K rows in the
>ancestry table. The following is representative of some of the common
>queries I issue:
> 
>select * from image natural join ancestry where ancestorid=100 and
>(state & 7::bigint) = 0::bigint;
> 
>When I ask postgres to EXPLAIN it, I get the following:
> 
>Merge Join  (cost=81858.22..81900.60 rows=124 width=49)
>  ->  Sort  (cost=81693.15..81693.15 rows=16288 width=41)
>->  Seq Scan on image  (cost=0.00..80279.17 rows=16288 width=41)
>  ->  Sort  (cost=165.06..165.06 rows=45 width=8)
>->  Index Scan using ancestry_ancestorid_key on ancestry 
>(cost=0.00..163.83 rows=45 width=8)
> 
>It appears to me that the query executes as follows:
> 
>1. Scan every row in the image table to find those where (state &
>7::bigint) = 0::bigint
>2. Sort the results
>3. Use an index on ancestry to find rows where ancestorid=100
>4. Sort the results
>5. Join the two

FWIW, I use INTs as bit vectors for options in various applications
and have run into this in a few cases.  In the database, I only care
about a few bits in the options INT, so what I did was create a
function for each of the bits that I care about and then a function
index.  Between the two, I've managed to solve my performance
problems.

CREATE FUNCTION app_option_foo_is_set(INT)
RETURNS BOOL
IMMUTABLE
AS '
BEGIN
IF $1 & 7::INT THEN
RETURN TRUE;
ELSE
RETURN FALSE;
END IF;
END;
' LANGUAGE 'plpgsql';
CREATE INDEX app_option_foo_fidx ON app_option_tbl (app_option_foo_is_set(options));
VACUUM ANALYZE;

Just make sure that you set your function to be IMMUTABLE. -sc


PS It'd be slick if PostgreSQL would collapse adjacent booleans into a
   bit in a byte: it'd save some apps a chunk of space.  32 options ==
   32 bytes with the type BOOL, but if adjacent BOOLs were collapsed,
   it'd only be 4 bytes on disk and maybe some page header data.

-- 
Sean Chittenden

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[PERFORM] Moving postgresql.conf tunables into 2003...

2003-07-03 Thread Sean Chittenden
What are the odds of going through and revamping some of the tunables
in postgresql.conf for the 7.4 release?  I was just working with
someone on IRC and on their 7800 RPM IDE drives, their
random_page_cost was ideally suited to be 0.32: a far cry from 4.
Doing so has been a win across the board and the problem query went
from about 40sec (seq scan) down to 0.25ms (using idx, higher than
0.32 resulted in a query time jump to 2sec, and at 0.4 it went back up
to a full seq scan at 40sec).

I know Josh is working on revamping the postgresql.conf file, but
would it be possible to include suggested values for various bits of
hardware and then solicit contributions from admins on this list who
have tuned their DB correctly?

## random_page_cost -- units are one sequential page fetch cost
#random_page_cost = 4   # default - very conservative
#random_page_cost = 0.9 # IDE 5200 RPM, 8MB disk cache
#random_page_cost = 0.3 # IDE 7800 RPM, 4MB disk cache
#random_page_cost = 0.1 # SCSI RAID 5, 10,000RPM, 64MB cache
#random_page_cost = 0.05# SCSI RAID 1+0, 15,000RPM, 128MB cache
#...

## next_hardware_dependent_tunable
#hardware_dependent_tunable

I know these tables could get somewhat lengthy or organized
differently, but given the file is read _once_ at _startup_, seen by
thousands of DBAs, is visited at least once for every installation (at
the least to turn on TCP connections), is often the only file other
than pg_hba.conf that gets modified or looked at, this could be a very
nice way of introducing DBAs to tuning PostgreSQL and reducing the
number of people crying "PostgreSQL's slow."  Having postgresql.conf a
clearing house for tunable values for various bits of hardware would
be a huge win for the community and would hopefully radically change
this database's perception.  At the top of the file, it would be
useful to include a blurb to the effect of:

# The default values for PostgreSQL are extremely conservative and are
# likely far from ideal for a site's needs.  Included in this
# configuration, however, are _suggested_ values to help aid in
# tuning.  The values below are not authoritative, merely contributed
# suggestions from PostgreSQL DBAs and committers who have
# successfully tuned their databases.  Please take these values as
# advisory only and remember that they will very likely have to be
# adjusted according to your site's specific needs.  If you have a
# piece of hardware that isn't mentioned below and have tuned your
# configuration aptly and have found a suggested value that the
# PostgreSQL community would benefit from, please send a description
# of the hardware, the name of the tunable, and the tuned value to
# [EMAIL PROTECTED] to be considered for inclusion in future
# releases.
#
# It should also go without saying that the PostgreSQL Global
# Development Group and its community of committers, contributors,
# administrators, and commercial supporters are absolved from any
# responsibility or liability with regards to the use of its software
# (see this software's license for details).  Any data loss,
# corruption, or performance degradation is the responsibility of the
# individual or group of individuals using/managing this installation.
#
# Hints to DBAs:
#
# *) Setup a regular backup schedule (hint: pg_dump(1)/pg_dumpall(1) +
#cron(8))
#
# *) Tuning: Use psql(1) to test out values before changing values for
#the entire database.  In psql(1), type:
#
#1) SHOW [tunabe_name];
#2) SET [tunable_name] = [value];
#3) [run query]
#4) [repeat adjustments as necessary before setting a value here in
#   the postgresql.conf].
#5) [Send a SIGHUP signal to the backend to have the config values
#   re-read]
#
# *) Never use kill -9 on the backend to shut it down.
#
# *) VACUUM ANALYZE your databases regularly.
#
# *) Use EXPLAIN ANALYZE [query] to tune queries.
#
# *) Read the online documentation at:
#http://www.postgresql.org/docs/
#
# -- PostgreSQL Global Development Group

Just a thought.  A bit lengthy, but given that out of the box most
every value is set to be extremely conservative (detrimentally so, esp
since the majority of users aren't running PostgreSQL in embedded
devices, are on reasonably new hardware > 3 years old), and the config
is only read in once and generally the only file viewed by DBAs, it'd
make PostgreSQL more competitive in the performance dept if there were
some kind of suggested values for various tunables.  Having someone
whine, "my PostgreSQL database is slow" is really getting old when its
really not and it's a lack of tuning that is at fault, lowering the
bar to a successful and speedy PostgreSQL installation would be a win
for everyone.  The person who I was helping also had the same data,
schema, and query running on MySQL and the fastest it could go was
2.7s (about 40M rows in the table).

 -s

Re: [PERFORM] Moving postgresql.conf tunables into 2003...

2003-07-03 Thread Sean Chittenden
> > What are the odds of going through and revamping some of the
> > tunables in postgresql.conf for the 7.4 release?  I was just
> > working with someone on IRC and on their 7800 RPM IDE drives,
> > their random_page_cost was ideally suited to be 0.32: a far cry
> > from 4.  Doing so has been a win across the board and the problem
> > query went from about 40sec (seq scan) down to 0.25ms (using idx,
> > higher than 0.32 resulted in a query time jump to 2sec, and at 0.4
> > it went back up to a full seq scan at 40sec).
> 
> I'm the guy who advocates settings of 1 to 2, and that still sounds
> low to me. :-) I'm wondering if the effective_cache_size was set
> properly, as well as there be enough buffers allocated.
> 
> I generally set effective cache size to 100,000 pages (800 megs or
> so) on my box, which is where it sits most days.  with this setting
> I've found that settings of under 1 are not usually necessary to
> force the planner to take the path of righteousness (i.e. the
> fastest one :-) 1.2 to 1.4 are optimal to me.

This is a nightly report that's run, cache sizes won't impact
performance of the query at all.  The planner was consistently
choosing a sequential scan over using the index until the
random_page_cost was set to 0.32.  After adjustment, the query just
flies ([EMAIL PROTECTED] vs [EMAIL PROTECTED] vs. 40s@>0.4).  Since it's a nightly
report that only gets performed once a day and data is COPY'ed in once
every few minutes, there's a huge amount of data that's not cached nor
should it be.

> Since theoretically a random page of of 1 means no penalty to move
> the heads around, and there's ALWAYS a penalty for moving the heads
> around, we have to assume:
> 
> 1: That either the planner is making poor decisions on some other
> variable, and we can whack the planner in the head with a really low
> random page count.

By all accounts of having played with this query+data, this is the
correct assumption from what I can tell.

> OR 
> 
> 2: The other settings are suboptimal (buffers, sort_mem,
> effective_cache_size, etc...) and lowering random page costs helps
> there.

None of those other than possibly sort_mem had any impact on the
query, but even then, lower sort_mem doesn't help until the data's
been picked out of the table.  Sorting ~16k of rows is quicker with
more sort_mem.  Higher sort_mem has zero impact on fetching ~16K rows
out of a table with 40M rows of data.  Getting the planner to pick
using the index to filter out data inserted in the last 3 days over
doing a seq scan...  well, I don't know how you could do that without
changing the random_page_cost.  A good thump to the side of the head
would be welcome too if I'm wrong, just make sure it's a good thump
with the appropriate clue-bat.

> I've always wondered if most performance issues aren't a bit of both.  

Eh, in my experience, it's generally that random_page_cost needs to be
adjusted to match the hardware and this value every year with new
hardware, seems to be getting lower.

> The answer, of course, is fixing the planner so that a
> random_page_cost of anything less than 1 would never be needed,
> since by design, anything under 1 represents a computer that likely
> doesn't exist (in theory of course.)  A 1 would be a machine that
> was using solid state hard drives and had the same cost in terms of
> OS paths to do random accesses as sequential.

Well, this could be a bug then, but I'm skeptical.  What's odd to me
is that hanging the value between 0.32, 0.33, and 0.4 all radically
change the performance of the query.

> What constants in the planner, and / or formulas would be the likely
> culprits I wonder?  I've wandered through that page and wasn't sure
> what to play with.

random_page_cost should be proportional to the seek time necessary for
the disk to find a page of data on its platters.  It makes sense that
this value, as time progresses, gets smaller as hardware gets faster.

-sc

-- 
Sean Chittenden

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Moving postgresql.conf tunables into 2003...

2003-07-03 Thread Sean Chittenden
> I'm curious how many of the configuration values can be determined 
> automatically, or with the help of some script.  It seem like there 
> could be some perl script in contrib that could help figure this out.  
> Possibly you are asked a bunch of questions and then the values are 
> computed based on that.   Something like:
> 
> How many tables will the system have?
> How much memory will be available to the postmaster?
> How many backends will there typically be?
> What is the avg seek time of the drive?
> What's the transfer rate of the drive?
> 
> Seems to me that a lot of reasonable default values can be figure out 
> from these basic questions.  FSM settings, Sort Mem, Random Page Cost, 
> Effective Cache Size, Shared Memor, etc, etc.

Someone was working on a thing called pg_autotune or some such program
that'd do exactly what you're thinking of.  

http://archives.postgresql.org/pgsql-performance/2002-10/msg00101.php
http://gborg.postgresql.org/project/pgautotune/projdisplay.php


-- 
Sean Chittenden

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Moving postgresql.conf tunables into 2003...

2003-07-03 Thread Sean Chittenden
> > What are the odds of going through and revamping some of the
> > tunables in postgresql.conf for the 7.4 release?  I was just
> > working with someone on IRC and on their 7800 RPM IDE drives,
> > their random_page_cost was ideally suited to be 0.32: a far cry
> > from 4.
> 
> I find it very very hard to believe a random read was cheaper than a
> sequential read.  Something is shifty in your testing.

This is the procedure used to zero in on the number:

SET random_page_cost = 3;
[run query three times]
SET random_page_cost = 2;
[run query three times]
SET random_page_cost = 1;
[run query three times]
SET random_page_cost = 0.01;  -- verify that this tunable would make
  -- a difference eventually
[run query three times]
SET random_page_cost = 0.5;
[run query three times]
SET random_page_cost = 0.2;   -- this was the 1st query that didn't
  -- do a seq scan
[run query three times]
SET random_page_cost = 0.4;   -- back to a seq scan
[run query three times]
SET random_page_cost = 0.3;   -- idx scan, how high can I push the rpc?
[run query three times]
SET random_page_cost = 0.35;  -- interesting, the query time jumped to
  -- about 0.2s... better than 40s, but not as
  -- nice as the 0.25ms when the rpc was at 0.3
[run query three times]
SET random_page_cost = 0.32;  -- Sweet, 0.25ms for the query
[run query three times]
SET random_page_cost = 0.33;  -- Bah, back up to 0.2s
[run query three times]
SET random_page_cost = 0.31;  -- Down to 0.25ms, too low
[run query three times]
SET random_page_cost = 0.33;  -- Double check that it wasn't an errant
  -- performance at 0.33
[run query three times]
SET random_page_cost = 0.32;  -- Double check that 0.32 is the magic number
[run query three times]

[edit postgresql.conf && killall -SIGHUP postmaster]

-sc

-- 
Sean Chittenden


pgp0.pgp
Description: PGP signature


Re: [PERFORM] Moving postgresql.conf tunables into 2003...

2003-07-03 Thread Sean Chittenden
 Limit  (cost=227274.85..227274.90 rows=20 width=12) (actual time=28.42..28.75 rows=20 
loops=1)
   ->  Sort  (cost=227274.85..227316.97 rows=16848 width=12) (actual time=28.41..28.52 
rows=21 loops=1)
 Sort Key: count(srca)
 ->  Aggregate  (cost=224828.52..226092.11 rows=16848 width=12) (actual 
time=20.26..28.13 rows=23 loops=1)
   ->  Group  (cost=224828.52..225670.91 rows=168478 width=12) (actual 
time=19.99..25.86 rows=320 loops=1)
 ->  Sort  (cost=224828.52..225249.72 rows=168478 width=12) 
(actual time=19.98..21.76 rows=320 loops=1)
   Sort Key: srca
   ->  Index Scan using mss_fwevent_evtime_sensorid_idx on 
mss_fwevent  (cost=0.00..210202.76 rows=168478 width=12) (actual time=0.35..17.61 
rows=320 loops=1)
 Index Cond: ((evtime > (now() - '06:00'::interval)) 
AND (sensorid = 7))
 Filter: (NOT "action")
 Total runtime: 29.09 msec
(11 rows)

And there 'ya have it.  The times are different from when I had him
send me the queries this morning, but they're within an order of
magnitude difference between each and show the point.  Oh, today they
did a bunch of pruning of old data (nuked June's data)... the runtime
differences are basically the same though.

> > I know Josh is working on revamping the postgresql.conf file, but
> > would it be possible to include suggested values for various bits of
> > hardware and then solicit contributions from admins on this list who
> > have tuned their DB correctly?
> 
> I think such material belongs in the SGML docs, not hidden away in a
> config file that people may not look at...

The config file isn't hidden though and is very visible in the tuning
process and to DBAs.  I don't know if a PostgreSQL distributions ship
with TCP connections enabled by default (FreeBSD doesn't), so the
config is always seen and viewed by DBAs.  If it's not the TCP
connections setting, it's the max connections setting or sort_mem,
etc... having the values dup'ed in the SGML, however, would be good
too, but it's of most practical relevance in the actual config: as an
admin setting up a DB, I'd rather not have to fish around on
postgresql.org to find a recommended setting, having it inline and
just having to uncomment it is by far and away the most DBA friendly
and likely to be used in the wild by admins.

-sc

-- 
Sean Chittenden

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] Moving postgresql.conf tunables into 2003...

2003-07-05 Thread Sean Chittenden
> Sean Chittenden <[EMAIL PROTECTED]> writes:
> > Getting the planner to pick
> > using the index to filter out data inserted in the last 3 days over
> > doing a seq scan...  well, I don't know how you could do that without
> > changing the random_page_cost.
> 
> This sounds a *whole* lot like a correlation issue.  If the data in
> question were scattered randomly in the table, it's likely that an
> indexscan would be a loser.  The recently-inserted data is probably
> clustered near the end of the table (especially if they're doing
> VACUUM FULL after data purges; are they?).  But the planner's
> correlation stats are much too crude to recognize that situation, if
> the rest of the table is not well-ordered.

Data isn't scattered randomly from what I can tell and is basically
already clustered just because the data is inserted linearly and
based off of time.  I don't think they're doing a VACUUM FULL after a
purge, but I'll double check on that on Monday when they get in.  Is
there an easy way of determining or setting a planner stat to suggest
that data is ordered around a column in a permanent way?  CLUSTER has
always been a one shot deal and its effects wear off quickly depending
on the way that data is inserted.  It seems as though that this would
be a circumstance in which preallocated disk space would be a win
(that way data wouldn't always be appended to the heap and could be
inserted in order, of most use for non-time related data: ex, some
non-unique ID).

> If their typical process involves a periodic data purge and then a
> VACUUM FULL, it might be worth experimenting with doing a CLUSTER on
> the timestamp index instead of the VACUUM FULL.  The CLUSTER would
> reclaim space as effectively as VACUUM FULL + REINDEX, and it would
> leave the table with an unmistakable 1.0 correlation ... which
> should tilt the planner towards an indexscan without needing a
> physically impossible random_page_cost to do it.  I think CLUSTER
> would probably be a little slower than VACUUM FULL but it's hard to
> be sure without trying.

Hrm, I understand what clustering does, I'm just not convinced that
it'll "fix" this performance problem unless CLUSTER sets some kind of
hint that ANALYZE uses to modify the way in which it collects
statistics.  Like I said, I'll let you know on Monday when they're
back in the shop, but I'm not holding my breath.  I know
random_page_cost is set to something physically impossible, but in
terms of performance, it's always been the biggest win for me to set
this puppy quite low.  Bug in the planner, or documentation
surrounding what this knob does, I'm not sure, but setting this to a
low value consistently yields good results for me.  Faster the drive,
the lower the random_page_cost value.  *shrug*

> That's one heck of a poor estimate for the number of rows returned.
> 
> > ->  Seq Scan on mss_fwevent  (cost=0.00..223312.60 rows=168478 width=12) (actual 
> > time=24253.66..24319.87 rows=320 loops=1)

The stats for the columns are already set to 1000 to aid with
this... don't know what else I can do here.  Having the planner off by
as much as even half the actual size isn't uncommon in my experience.

-sc

-- 
Sean Chittenden

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Moving postgresql.conf tunables into 2003...

2003-07-05 Thread Sean Chittenden
> > # The default values for PostgreSQL are extremely conservative and
> > # are likely far from ideal for a site's needs.  Included in this
> > # configuration, however, are _suggested_ values to help aid in >
> > # 
> 
> This sort of narrative belongs in the SGML docs, not in a CONF file.
> In fact, one could argue that we should take *all* commentary out of
> the CONF file in order to force people to read the docs.

The SGML docs aren't in the DBA's face and are way out of the way for
DBAs rolling out a new system or who are tuning the system.  SGML ==
Developer, conf == DBA.

> Database performance tuning will always be a "black art," as it
> necessitates a broad knowledge of PostgreSQL, OS architecture, and
> computer hardware.  So I doubt that we can post docs that would
> allow any 10% time DBA to make PostgreSQL "fly", but hopefully over
> the next year we can make enough knowledge public to allow anyone to
> make PostgreSQL "sprint".

I'm highly resistant to/disappointed in this attitude and firmly
believe that there are well understood algorithms that DBAs use to
diagnose and solve performance problems.  It's only a black art
because it hasn't been documented.  Performance tuning isn't voodoo,
it's adjusting constraints to align with the execution of applications
and we know what the applications do, therefore the database can mold
to the applications' needs.  Some of those parameters are based on
hardware constraints and should be pooled and organized as such.

random_page_cost ==
avg cost of a random disk seek/read (eg: disk seek time) ==
constant integer for a given piece of hardware

There are other settings that are RAM based as well, which should be
formulaic and derived though a formula hasn't been defined to date.

-sc

-- 
Sean Chittenden

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] Moving postgresql.conf tunables into 2003...

2003-07-05 Thread Sean Chittenden
> > The SGML docs aren't in the DBA's face and are way out of the way
> > for DBAs rolling out a new system or who are tuning the system.
> > SGML == Developer, conf == DBA.
> 
> That's exactly my point.  We cannot provide enough documentation in
> the CONF file without septupling its length.  IF we remove all
> commentary, and instead provide a pointer to the documentation, more
> DBAs will read it.

Which I don't think would happen and why I think the terse bits that
are included are worth while.  :)

> > Some of those parameters are based on hardware constraints and
> > should be pooled and organized as such.
> > 
> > random_page_cost ==
> > avg cost of a random disk seek/read (eg: disk seek time) ==
> > constant integer for a given piece of hardware
> 
> But, you see, this is exactly what I'm talking about.
> random_page_cost isn't static to a specific piece of hardware ... it
> depends as well on what else is on:

*) the disk/array

translation: how fast data is accessed and over how many drives.

*) concurrent disk activity

A disk/database activity metric is different than the cost of a seek
on the platters.  :) Because PostgreSQL doesn't currently support such
a disk concurrency metric doesn't mean that its definition should get
rolled into a different number in an attempt to accommodate for a lack
thereof.

*) disk controller settings

This class of settings falls into the same settings that affect random
seeks on the platters/disk array(s).

*) filesystem

Again, this influences avg seek time

*) OS

Again, avg seek time

*) distribution of records and tables

This has nothing to do with PostgreSQL's random_page_cost setting
other than that if data is fragmented on the platter, the disk is
going to have to do a lot of seeking.  This is a stat that should get
set by ANALYZE, not by a human.

*) arrangement of the partitions on disk

Again, avg seek time.

> One can certainly get a "good enough" value by benchmarking the
> disk's random seek and calculating based on that ... but to get an
> "ideal" value requires a long interactive session by someone with
> experience and in-depth knowledge of the machine and database.

An "ideal" value isn't obtained via guess and check.  Checking is only
the verification of some calculable set of settingsthough right now
those calculated settings are guessed, unfortunately.

> > There are other settings that are RAM based as well, which should
> > be formulaic and derived though a formula hasn't been defined to
> > date.
> 
> You seem pretty passionate about this ... how about you help me an
> Kevin define a benchmarking suite when I get back into the country
> (July 17)?  If we're going to define formulas, it requires that we
> have a near-comprehensive and consistent test database and test
> battery that we can run on a variety of machines and platforms.

Works for me, though a benchmark will be less valuable than adding a
disk concurrency stat, improving data trend/distribution analysis, and
using numbers that are concrete and obtainable through the OS kernel
API or an admin manually plunking numbers in.  I'm still recovering
from my move from Cali to WA so with any luck, I'll be settled in by
then.

-sc

-- 
Sean Chittenden

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Moving postgresql.conf tunables into 2003...

2003-07-09 Thread Sean Chittenden
> I don't have much to add because I'm pretty new to Postgres and have
> been soliciting advice here recently, but I totally agree with
> everything you said.  I don't mind if it's in the postgres.conf file
> or in a faq that is easy to find, I just would like it to be in one
> place.  A good example of the need for this is when I was tuning
> "effective_cache" I thought that was creating a cache for Postgres
> when in fact as it was pointed out to me, it's just hinting to
> postgres the size of the OS cache.  Lots of ways for people to get
> really confused here.

I looked through the src/doc/runtime.sgml for a good place to stick
this and couldn't find a place that this seemed appropriate, but on
FreeBSD, this can be determined with a great deal of precision in a
programmatic manner:

echo "effective_cache_size = $((`sysctl -n vfs.hibufspace` / 8192))"

The same OID is available via C too.  It'd be slick if PostgreSQL
could tune itself (on FreeBSD) at initdb time with the above code.  If
Linux exports this info via /proc and can whip out the appropriate
magic, even better.  An uncommented out good guess that shows up in
postgresql.conf would be stellar and quite possible with the use of
sed.

Maybe an initdb switch could be added to have initdb tune the config
it generates?  If a -n is added, have it generate a config and toss it
to stdout?


case `uname` in
"FreeBSD")
echo "effective_cache_size = $((`sysctl -n vfs.hibufspace` / 8192))"
;;
*)
echo "Unable to automatically determine the effective cache size" >> 
/dev/stderr
;;
esac


-sc

-- 
Sean Chittenden

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Moving postgresql.conf tunables into 2003...

2003-08-05 Thread Sean Chittenden
N ANALYZE SELECT srca, COUNT(srca) FROM mss_fwevent WHERE
mss_masterlog-> sensorid = 7 AND evtime > (now() - '6 hours'::INTERVAL)
mss_masterlog-> AND NOT action GROUP BY srca ORDER BY COUNT DESC LIMIT 20;
  
QUERY PLAN
-
--
 Limit  (cost=227274.85..227274.90 rows=20 width=12) (actual time=28.42..28.75 rows=20 
loops=1)
   ->  Sort  (cost=227274.85..227316.97 rows=16848 width=12) (actual time=28.41..28.52 
rows=21 loops=1)
 Sort Key: count(srca)
 ->  Aggregate  (cost=224828.52..226092.11 rows=16848 width=12) (actual 
time=20.26..28.13 rows=23 loops=1)
   ->  Group  (cost=224828.52..225670.91 rows=168478 width=12) (actual 
time=19.99..25.86 rows=320 loops=1)
 ->  Sort  (cost=224828.52..225249.72 rows=168478 width=12) 
(actual time=19.98..21.76 rows=320 loops=1)
   Sort Key: srca
   ->  Index Scan using mss_fwevent_evtime_sensorid_idx on 
mss_fwevent  (cost=0.00..210202.76 rows=168478 width=12) (actual time=0.35..17.61
rows=320 loops=1)
 Index Cond: ((evtime > (now() - '06:00'::interval)) 
AND (sensorid = 7))
 Filter: (NOT "action")
 Total runtime: 29.09 msec
(11 rows)

-- 
Sean Chittenden

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Moving postgresql.conf tunables into 2003...

2003-08-09 Thread Sean Chittenden
> >> I have an experimental patch lying around somewhere that tries to
> >> work around these problems by offering different estimation methods
> >> for index scans.  If you are interested, I'll dig it out.
> >
> >Sure, I'll take a gander... had my head in enough Knuth recently to
> >even hopefully have some kind of a useful response to the patch.
> 
> Sean, the patch is at http://www.pivot.at/pg/16-correlation-732.diff.
> A short description of its usage can be found at
> http://archives.postgresql.org/pgsql-performance/2002-11/msg00256.php.
> If you are interested how the different interpolation methods work,
> read the source - it shouldn't be too hard to find.
> 
> You might also want to read the thread starting at
> http://archives.postgresql.org/pgsql-hackers/2002-10/msg00072.php.

Hrm... let me bop back in my archives and reply there...  very
interesting work though.  I hope a reasonable algorythm can be found
in time for 7.5, or even 7.4 as this seems to be biting many people
and the current algo is clearly not right.

> >>  does this mean that sensorid, evtime, and action are not
> >> independent?
> >
> >Hrm...  sensorid is sequence and grows proportional with evtime,
> >obviously.
> 
> So a *low* sensorid (7) is quite uncommon for a *late* evtime?  This
> would help understand the problem.  Unfortunately I have no clue what
> to do about it.  :-(

Correct.

> >Having spent a fair amount of time looking at the two following plans,
> >it seems as though an additional statistic is needed to change the
> >cost of doing an index lookup when the index is linearly ordered.
> 
> I'm not sure I understand what you mean by "index is linearly
> ordered",  but I guess correlation is that statistic you are talking
> about.  However, it is calculated per column, not per index.

If two rows are id's 123456 and 123457, what are the odds that the
tuples are going to be on the same page?  ie, if 123456 is read, is
123457 already in the OS or PostgreSQL's disk cache?

> >Whether CLUSTER does this or not, I don't know,
> 
> If you CLUSTER on an index and then ANALYSE, you get a correlation of
> 1.0 (== optimum) for the first column of the index.

Correlating of what to what?  Of data to nearby data?  Of data to
related data (ie, multi-column index?)? Of related data to pages on
disk?  Not 100% sure in what context you're using the word
correlation...

But that value will degrade after time and at what rate?  Does ANALYZE
maintain that value so that it's kept acurrate?  The ANALYZE page was
lacking in terms of implementation details in terms of how many rows
ANALYZE actually scans on big tables, which could dramatically affect
the correlation of a table after time if ANALYZE is maintaining the
correlation for a column.

> > I never heard back from him after getting the runtime down to a
> > few ms.  :-/
> 
> Pity!  I'd have liked to see EXPLAIN ANALYSE for
> 
>   SELECT *
> FROM mss_fwevent
>WHERE sensorid = 7
>  AND evtime > (now() - '6 hours'::INTERVAL)
>  AND NOT action;
> 
>   SELECT *
> FROM mss_fwevent
>WHERE sensorid = 7
>  AND evtime > (now() - '6 hours'::INTERVAL);
> 
>   SELECT *
> FROM mss_fwevent
>WHERE evtime > (now() - '6 hours'::INTERVAL);
> 
>   SELECT *
> FROM mss_fwevent
>WHERE sensorid = 7;

ditto

> > Are indexes
> >on linearly ordered data rebalanced somehow?  I thought CLUSTER only
> >reordered data on disk.  -sc
> 
> AFAIK CLUSTER re-creates all indices belonging to the table.

As of 7.3 or 7.4, yes.  -sc

-- 
Sean Chittenden

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Perfomance Tuning

2003-08-14 Thread Sean Chittenden
> > > Well, yeah.  But given the Linux propensity for introducing major
> > > features in "minor" releases (and thereby introducing all the
> > > attendant bugs), I'd think twice about using _any_ Linux feature
> > > until it's been through a major version (e.g. things introduced in
> > > 2.4.x won't really be stable until 2.6.x) -- and even there one is
> > > taking a risk[1].
> > 
> > Dudes, seriously - switch to FreeBSD :P
> 
> Yeah, it's nice to have a BUG FREE OS huh? ;^)
> 
> And yes, I've used FreeBSD, it's quite good, but I kept getting the
> feeling it wasn't quite done.  Especially the installation
> documentation.

While the handbook isn't the same as reading the actual source or the
only FreeBSD documentation, it certainly is quite good (to the point
that publishers see small market to publish FreeBSD books because the
documentation provided by the project is so good), IMHO.

http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/

If anyone on this list has any issues with the documentation, please
take them up with me _privately_ and I will do my best to either
address or correct the problem.

Now, back to our regularly scheduled and on topic programming...  -sc

-- 
Sean Chittenden
"(PostgreSQL|FreeBSD).org - The Power To Serve"

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Perfomance Tuning

2003-08-14 Thread Sean Chittenden
> Andrew Sullivan's fairly regular response is that he tried (albeit
> not VASTLY extensively) to distinguish between disks when working
> with fibre arrays, and he couldn't measure an improvement in
> shifting WAL (the OBVIOUS thing to shift) to separate disks.

Real quick... the faster the drives, the less important it is to move
WAL onto a different drive.  The slower the drives, the more important
this is... which is why this isn't as necessary (if at all) for large
production environments.

-sc

-- 
Sean Chittenden

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] The results of my PostgreSQL/filesystem performance tests

2003-08-28 Thread Sean Chittenden
> What it still leaves quite open is just what happens when the OS has
> more than one disk drive or CPU to play with.  It's not clear what
> happens in such cases, whether FreeBSD would catch up, or be "left
> further in the dust."  The traditional "propaganda" has been that
> there are all sorts of reasons to expect PostgreSQL on FreeBSD to
> run a bit faster than on Linux; it is a bit unexpected for the
> opposite to seem true.

Let me nip this in the butt before people run away with ideas that
aren't correct.  When the tests were performed in FreeBSD 5.1 and
Linux, the hard drives were running UDMA.  When running 4.8, for some
reason his drives settled in on PIO mode:

ad0s1a: UDMA ICRC error writing fsbn 1458368 of 729184-729215 (ad0s1 bn 1458368; cn 
241 tn 12 sn 44) retrying
ad0s1a: UDMA ICRC error writing fsbn 1458368 of 729184-729215 (ad0s1 bn 1458368; cn 
241 tn 12 sn 44) retrying
ad0s1a: UDMA ICRC error writing fsbn 1458368 of 729184-729215 (ad0s1 bn 1458368; cn 
241 tn 12 sn 44) retrying
ad0s1a: UDMA ICRC error writing fsbn 1458368 of 729184-729215 (ad0s1 bn 1458368; cn 
241 tn 12 sn 44) falling back to PIO mode

The benchmarks were hardly conclusive as UDMA runs vastly faster than
PIO.  Until we hear back as to whether cables were jarred loose
between the tests or hearing if something else changed, I'd hardly
consider these conclusive tests given PIO/UDMA is apples to oranges in
terms of speed and I fully expect that FreeBSD 4.8 will perform at
least faster than 5.1 (5.x is still being unwound from Giant), but
should out perform Linux as well if industry experience iss any
indicator.

-sc

-- 
Sean Chittenden

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] The results of my PostgreSQL/filesystem performance tests

2003-08-28 Thread Sean Chittenden
> I need to step in and do 2 things:

Thanks for posting that.  Let me know if you have any questions while
doing your testing.  I've found that using 16K blocks on FreeBSD
results in about an 8% speedup in writes to the database, fwiw.

I'm likely going to make this the default for PostgreSQL on FreeBSD
starting with 7.4 (just posted something to -hackers about this)f.  If
you'd like to do this in your testing, just apply the following patch.

Right now PostgreSQL defaults to 8K blocks, but FreeBSD uses 16K
blocks which means that currently, reading two blocks of data in PG is
two read calls to the OS, one reads 16K of data off disk and returns
the 1st page, the 2nd call pulls the 2nd block from the FS cache.  In
making things 16K, it avoids the need for the 2nd system call which is
where the performance difference is coming from, afaikt.  -sc

-- 
Sean Chittenden
Index: src/include/pg_config_manual.h
===
RCS file: /home/ncvs/pgsql/pgsql-server/src/include/pg_config_manual.h,v
retrieving revision 1.5
diff -u -r1.5 pg_config_manual.h
--- src/include/pg_config_manual.h  4 Aug 2003 00:43:29 -   1.5
+++ src/include/pg_config_manual.h  27 Aug 2003 17:40:12 -
@@ -23,7 +23,7 @@
  *
  * Changing BLCKSZ requires an initdb.
  */
-#define BLCKSZ 8192
+#define BLCKSZ 16384
 
 /*
  * RELSEG_SIZE is the maximum number of blocks allowed in one disk

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] The results of my PostgreSQL/filesystem performance tests

2003-08-28 Thread Sean Chittenden
> >> I need to step in and do 2 things:
> SC> Thanks for posting that.  Let me know if you have any questions while
> SC> doing your testing.  I've found that using 16K blocks on FreeBSD
> SC> results in about an 8% speedup in writes to the database, fwiw.
> 
> ok.. ignore my prior request about how to set that... i missed you
> had included a patch.
> 
> Any recommendations on newfs parameters for an overly large file
> system used solely for Postgres?  Over 100Gb (with raid 10) or over
> 200Gb (with raid 5)?

Nope, you'll have to test and see.  If you find something that works,
however, let me know.  -sc

-- 
Sean Chittenden

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] bad estimates

2003-08-29 Thread Sean Chittenden
> >If you want both the max and the min, then things are going to be a
> >bit more work. You are either going to want to do two separate
> >selects or join two selects or use subselects. If there aren't
> >enough prices per stock, the sequential scan might be fastest since
> >you only need to go through the table once and don't have to hit
> >the index blocks.
> >
> >It is still odd that you didn't get a big speed up for just the min though.
> 
> I found I'm suffering from an effect detailed in a previous thread titled
> 
>   Does "correlation" mislead the optimizer on large tables?

I don't know about large tables, but this is a big problem and
something I'm going to spend some time validating later today.  I
think Manfred's patch is pretty good and certainly better than where
we are but I haven't used it yet to see if it's the magic ticket for
many of these index problems.

-sc

-- 
Sean Chittenden

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] bad estimates

2003-08-29 Thread Sean Chittenden
> >>I found I'm suffering from an effect detailed in a previous thread titled
> >>
> >>Does "correlation" mislead the optimizer on large tables?
> >
> >
> >I don't know about large tables, but this is a big problem and
> >something I'm going to spend some time validating later today.  I
> >think Manfred's patch is pretty good and certainly better than where
> >we are but I haven't used it yet to see if it's the magic ticket for
> >many of these index problems.
> 
> I had to dig through a lot of archives to find this.  Is this the patch, 
> from last October?
> 
> http://members.aon.at/pivot/pg/16-correlation.diff
> 
> If so, I'll try it out and report my results.

Same guy, but that patch is pretty out of date and has been replaced
by some newer work that's much better.

From: Manfred Koizar <[EMAIL PROTECTED]>
Cc: [EMAIL PROTECTED]
Subject: Re: [HACKERS] Correlation in cost_index()
Date: Wed, 20 Aug 2003 19:57:12 +0200
Message-ID: <[EMAIL PROTECTED]>


and


From: Manfred Koizar <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Subject: [HACKERS] Again on index correlation
Date: Wed, 20 Aug 2003 21:21:14 +0200
Message-ID: <[EMAIL PROTECTED]>


-sc

-- 
Sean Chittenden

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] FreeBSD page size

2003-09-03 Thread Sean Chittenden
> Ok... simple tests have completed.  Here are some numbers.
> 
> FreeBSD 4.8
> PG 7.4b2
> 4GB Ram
> Dual Xeon 2.4GHz processors
> 14 U320 SCSI disks attached to Dell PERC3/DC RAID controller in RAID 5
>  config with 32k stripe size
[snip]
> Then I took the suggestion to update PG's page size to 16k and did the
> same increase on sort_mem and checkpoint_segments as above.  I also
> halved the shared_buffers and max_fsm_pages  (probably should have
> halved the effective_cache_size too...)
> 
> restore time: 11322 seconds
> vacuum analyze time: 27 minutes
> select count(*) from user_list where owner_id=315;   48267.66 ms
> 
> 
> Granted, given this simple test it is hard to say whether the 16k
> blocks will make an improvement under live load, but I'm gonna give it
> a shot.  The 16k block size shows me roughly 2-6% improvement on these
> tests.
> 
> So throw in my vote for 16k blocks on FreeBSD (and annotate the docs
> to tell which parameters need to be halved to account for it).

I haven't had a chance to run any tests yet (ELIFE), but there was a
suggestion that 32K blocks was a better performer than 16K blocks
(!!??!!??).  I'm not sure why this is and my only guess is that it
relies more heavily on the disk cache to ease IO.  Since you have the
hardware setup, Vivek, would it be possible for you to run a test with
32K blocks?

I've started writing a threaded benchmarking program called pg_crush
that I hope to post here in a few days that'll time connection startup
times, INSERTs, DELETEs, UPDATEs, and both sequential scans as well as
index scans for random and sequentially ordered tuples.  It's similar
to pgbench, except it generates its own data, uses pthreads (chears on
KSE!), and returns more fine grained timing information for the
various activities.

-sc

-- 
Sean Chittenden

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] PostgreSQL Reliability when fsync = false on Linux-XFS

2003-09-04 Thread Sean Chittenden
> Just wonderin. What if you symlink WAL to a directory which is on
> mounted USB RAM drive?

USB 2.0 you mean?  It supposedly runs at 1394 speeds, but USB 1.0/1.1
runs at 1MB/s under ideal circumstances... that's slower than even old
IDE drives.

> Will that increase any throughput?

Probably not...

> I am sure a 256/512MB flash drive will cost lot less than a SCSI
> disk. May be even a GB on flash drive would do..

That's true... but on a per $$/MB, you're better off investing in RAM
and increasing your effective_cache_size.  If dd to a flash card is
faster than to an IDE drive, please let me know.  :) -sc

-- 
Sean Chittenden
UNIX(TM), a BSD like Operating System

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] Performance issue

2003-09-24 Thread Sean Chittenden
> My statistics(Athlon 1.8Ghz)
> 
> 20,000 itemsTakes on average 0.078seconds/room
> 385,000 items  Takes on average .11seconds/room
> 690,000 items  takes on average .270seconds/room
> 1,028,000 items   Takes on average .475seconds/room
[snip]
>  I am running Red hat 8. Some of my conf entries that I have changed 
> follow
> shared_buffers = 3700
> effective_cache_size = 4000
> sort_mem = 32168

Have you twiddled with your wal_buffers or checkpoint_segments?  Might
be something to look at.

-sc

-- 
Sean Chittenden

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] count(*) slow on large tables

2003-10-06 Thread Sean Chittenden
> How it will help? This is in addition to trigger proposal that came
> up earlier. With triggers it's not possible to make values visible
> across backends unless trigger updates a table, which eventually
> leads to vacuum/dead tuples problem.
> 
> 1. User creates a trigger to check updates/inserts for certain conditions.
> 2. It updates the count as and when required.
> 3. If the trigger detects the count is not initialized, it would issue the 
> same query first time. There is no avoiding this issue.
> 
> Besides providing facility of resident variables could be used
> imaginatively as well.
> 
> Does this make sense? IMO this is more generalised approach over all.

I do this _VERY_ frequently in my databases, only I have my stored
procs do the aggregate in a predefined MVCC table that's always there.
Here's a denormalized version for public consumption/thought:

CREATE TABLE global.dba_aggregate_cache (
  dbl TEXT NOT NULL,-- The database location, doesn't need to be
-- qualified (ex: schema.table.col)
  op TEXT NOT NULL, -- The operation, SUM, COUNT, etc.
  qual TEXT,-- Any kind of conditional, such as a where clause
  val_int INT,  -- Whatever the value is, of type INT
  val_bigint BIGINT,-- Whatever the value is, of type BIGINT
  val_text TEXT,-- Whatever the value is, of type TEXT
  val_bytea BYTEA,  -- Whatever the value is, of type BYTEA
);
CREATE UNIQUE INDEX dba_aggregate_cache_dbl_op_udx ON 
global.dba_aggregate_cache(dbl,op);

Then, I use a function to retrieve this value instead of a SELECT
COUNT(*).

SELECT public.cache_count('dbl','qual');  -- In this case, the op is COUNT
SELECT public.cache_count('dbl'); -- Returns the COUNT for the table listed in 
the dbl

Then, I create 4 or 5 functions (depends on the op I'm performing):

1) A private function that _doesn't_ run as security definer, that
   populates the global.dba_aggregate_cache row if it's empty.
2) A STABLE function for SELECTs, if the row doesn't exist, then it
   calls function #1 to populate its existence.
3) A STABLE function for INSERTs, if the row doesn't exist, then it
   calls function #1 to populate its existence, then adds the
   necessary bits to make it accurate.
4) A STABLE function for DELETEs, if the row doesn't exist, then it
   calls function #1 to populate its existence, then deletes the
   necessary bits to make it accurate.
5) A STABLE function for UPDATEs, if the row doesn't exist, then it
   calls function #1 to populate its existence, then updates the
   necessary bits to make it accurate.  It's not uncommon for me to
   not have an UPDATE function/trigger.

Create triggers for functions 2-5, and test away.  It's MVCC,
searching through a table that's INDEX'ed for a single row is
obviously vastly faster than a seqscan/aggregate.  If I need any kind
of an aggregate to be fast, I use this system with a derivation of the
above table.  The problem with it being that I have to retrain others
to use cache_count(), or some other function instead of using
COUNT(*).

That said, it'd be nice if there were a way to tell PostgreSQL to do
the above for you and teach COUNT(*), SUM(*), or other aggregates to
use an MVCC backed cache similar to the above.  If people want their
COUNT's to be fast, then they have to live with the INSERT, UPDATE,
DELETE cost.  The above doesn't work with anything complex such as
join's, but it's certainly a start and I think satisfies everyone's
gripes other than the tuple churn that _does_ happen (*nudge nudge*,
pg_autovacuum could be integrated into the backend to handle this).
Those worried about performance, the pages that are constantly being
recycled would likely stay in disk cache (PG or the OS).  There's
still some commit overhead, but still... no need to over optimize by
requiring the table to be stored in the out dated, slow, and over used
shm (also, *nudge nudge*).

Anyway, let me throw that out there as a solution that I use and it
works quite well.  I didn't explain the use of the qual column, but I
think those who grasp the above way of handling things probably grok
how to use the qual column in a dynamically executed query.

CREATE AGGREGATE CACHE anyone?

-sc

-- 
Sean Chittenden

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] go for a script! / ex: PostgreSQL vs. MySQL

2003-10-09 Thread Sean Chittenden
> Yeah, I had similar thought to Oliver's and suspected that this
> would be the answer.  Also, while it's not too hard to do this for a
> single platform, it gets complecated once you start looking at
> different ones.
> 
> Josh, let me know when you're ready to do this. I'll try to help,
> although my perl's kind of rusty. Also, can you even assume perl for
> a postgres install? Does Solaris, for instance come with perl?

Um, why not wait until the C version of initdb is committed, then
steak out a section that'll allow us to submit patches to have initdb
autotune to our hearts content?  There's a tad bit of precedence with
having shared buffer's automatically set in initdb, why not continue
with it?  I know under FreeBSD initdb will have some #ifdef's to wrap
around the syscall sysctl() to get info about kernel bits.  Talking
about how to expand handle this gracefully for a gazillion different
platforms might be a more useful discussion at this point because I'm
sure people from their native OS will be able to contrib the necessary
patches to extract info from their OS so that initdb can make useful
decisions.  Or, lastly, does anyone think that this should be in a
different, external program?  -sc

-- 
Sean Chittenden

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] go for a script! / ex: PostgreSQL vs. MySQL

2003-10-10 Thread Sean Chittenden
> NB> So far:
> 
> NB> shared_buffers = 1/16th of total memory
> NB> effective_cache_size = 80% of the supposed kernel cache.
> 
> Please take into account the blocksize compiled into PG, too...

Would anyone object to a patch that exports the blocksize via a
readonly GUC?  Too many tunables are page dependant, which is
infuriating when copying configs from DB to DB.  I wish pgsql had some
notion of percentages for values that end with a '%'.  -sc

-- 
Sean Chittenden

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] go for a script! / ex: PostgreSQL vs. MySQL

2003-10-11 Thread Sean Chittenden
> >NB> shared_buffers = 1/16th of total memory
> >NB> effective_cache_size = 80% of the supposed kernel cache.
> 
> I think Sean(?) mentioned this one for FreeBSD (Bash code):

sh, not bash.  :)

> echo "effective_cache_size = $((`sysctl -n vfs.hibufspace` / 8192))"
> 
> I've used it for my dedicated servers.  Is this calculation correct?

Yes, or it's real close at least.  vfs.hibufspace is the amount of
kernel space that's used for caching IO operations (minus the
necessary space taken for the kernel).  If you're real paranoid, you
could do some kernel profiling and figure out how much of the cache is
actually disk IO and multiply the above by some percentage, say 80%?
I haven't found it necessary to do so yet.  Since hibufspace is all IO
and caching any net activity is kinda pointless and I assume that 100%
of it is used for a disk cache and don't use a multiplier.  The 8192,
however, is the size of a PG page, so, if you tweak PG's page size,
you have to change this constant (*grumbles*).

-sc

-- 
Sean Chittenden

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] go for a script! / ex: PostgreSQL vs. MySQL

2003-10-13 Thread Sean Chittenden
> >> echo "effective_cache_size = $((`sysctl -n vfs.hibufspace` / 8192))"
> >> 
> >> I've used it for my dedicated servers.  Is this calculation correct?
> 
> SC> Yes, or it's real close at least.  vfs.hibufspace is the amount
> of SC> kernel space that's used for caching IO operations (minus the
> 
> I'm just curious if anyone has a tip to increase the amount of
> memory FreeBSD will use for the cache?

Recompile your kernel with BKVASIZE set to 4 times its current value
and double your nbuf size.  According to Bruce Evans:

"Actually there is a way: the vfs_maxbufspace gives the amount of
space reserved for buffer kva (= nbuf * BKVASIZE).  nbuf is easy to
recover from this, and the buffer kva space may be what is wanted
anyway."
[snip]
"I've never found setting nbuf useful, however.  I want most
parametrized sizes including nbuf to scale with resource sizes, and
it's only with RAM sizes of similar sizes to the total virtual address
size that its hard to get things to fit.  I haven't hit this problem
myself since my largest machine has only 1GB.  I use an nbuf of
something like twice the default one, and a BKVASIZE of 4 times the
default.  vfs.maxbufspace ends up at 445MB on the machine with 1GB, so
it is maxed out now."

YMMV.

-sc

-- 
Sean Chittenden

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] go for a script! / ex: PostgreSQL vs. MySQL

2003-10-13 Thread Sean Chittenden
> > > PostgreSQL requires some more shared memory to cache some
> > > tables, x Mb, do you want to increase your OS kernel parameters?
> > >
> > >Tweak shmmax and shmmall
> >
> > Note that this still requires a kernel recompile on FreeBSD :(
> 
> Not our fault, now is it?  This would mean that we wouldn't be able
> to script for FreeBSD.  Bug the FreeBSD developers.

And if you do so, you're going to hear that shm* is an antiquated
interface that's dated, slow, inefficient and shouldn't be used.  :)

Every few months one of the uber core BSD hackers threatens to rewrite
that part of PG because high up in the BSD camp, it's common belief
that shm* is a source of performance loss for PostgreSQL.  One of
these days it'll happen, probably with mmap() mmap()'ing MAP_SHARED
files stored in a $PGDATA/data/shared dir as mmap() is by far and away
the fastest shared memory mechanism and certainly is very widely
deployed (I would be surprised if any of the supported PG platforms
didn't have mmap()).

-sc

-- 
Sean Chittenden

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] go for a script! / ex: PostgreSQL vs. MySQL

2003-10-14 Thread Sean Chittenden
> >> This would be parameters such as the block size and a few
> >> other compile time parameters. If we can get to some of these
> >> read-only parameters than that would make this step easier,
> >> certainly for the new recruits amongst us.
> >
> > Actually, from my perspective, we shouldn't bother with this; if an admin
> > knows enough to set an alternate blaock size for PG, then they know
> > enough to tweak the Conf file by hand.  I think we should just issue a
> > warning that this script:
> > 1) does not work for anyone who is using non-default block sizes,
> 
> There was some talk, either on this list or freebsd-performance
> about setting the default block size for PostgreSQL running on
> FreeBSD to be 16k because of performance reasons. That is: *default*
> for the port, user is not asked.

Real quick, this isn't true, the block size is tunable, but does not
change the default.  You can set PGBLOCKSIZE to the values "16K" or
"32K" to change the block size, but the default remains 8K.

http://lists.freebsd.org/pipermail/freebsd-database/2003-October/000111.html

-sc

-- 
Sean Chittenden

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] PostgreSQL 7.4beta5 vs MySQL 4.0.16 with RT(DBIx::SearchBuilder)

2003-10-29 Thread Sean Chittenden
> So its not just PostgreSQL that is suffering from the bad SQL but
> MySQL also.  But the question is my does PostgreSQL suffer so badly
> ??  I think not all developers write very nice SQLs.
> 
> Its really sad to see that a fine peice of work (RT) is performing
> sub-optimal becoz of malformed SQLs.  [ specially on database of my
> choice ;-) ]

Post EXPLAIN ANALYZES of the queries you're running, then maybe you'll
be able to get some useful help from this list.  Until then, it's very
hard to speculate as to why PostgreSQL is slower.  -sc

-- 
Sean Chittenden

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]