Re: [PERFORM] [SQL] sql performance and cache

2003-10-14 Thread Tom Lane
Chris Faulkner [EMAIL PROTECTED] writes:
 I am seeing this message in my logs.
 bt_fixroot: not valid old root page

That's not good.  I'd suggest reindexing that index.

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


[PERFORM] Large Text Search Help

2003-10-14 Thread psql-mail
Hi,
I am trying to design a large text search database.

It will have upwards of 6 million documents, along with meta data on 
each.

I am currently looking at tsearch2 to provide fast text searching and 
also playing around with different hardware configurations.

1. With tsearch2 I get very good query times up until I insert more 
records. For example with 100,000 records tsearch2 returns in around 6 
seconds, with 200,000 records tsearch2 returns in just under a minute. 
Is this due to the indices fitting entirely in memory with 100,000 
records?

2. As well as whole word matching i also need to be able to do 
substring matching. Is the FTI module the way to approach this?

3. I have just begun to look into distibuted queries. Is there an 
existing solution for distibuting a postgresql database amongst 
multiple servers, so each has the same schema but only a subset of the 
total data?

Any other helpful comments or sugestions on how to improve query times 
using different hardware or software techniques would be appreciated.

Thanks,

Mat

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] go for a script! / ex: PostgreSQL vs. MySQL

2003-10-14 Thread Ivan Voras
 Date: Sun, 12 Oct 2003 13:30:45 -0700
 From: Josh Berkus [EMAIL PROTECTED]
 To: Nick Barr [EMAIL PROTECTED]
 Cc: [EMAIL PROTECTED]
 Subject: Re: go for a script! / ex:  PostgreSQL vs. MySQL
 Message-ID: [EMAIL PROTECTED]


 This would be parameters such as the block size and a few other
 compile time parameters. If we can get to some of these read-only
 parameters than that would make this step easier, certainly for the new
 recruits amongst us.

 Actually, from my perspective, we shouldn't bother with this; if an admin
 knows enough to set an alternate blaock size for PG, then they know
 enough to tweak the Conf file by hand.  I think we should just issue a
 warning that this script:
 1) does not work for anyone who is using non-default block sizes,

There was some talk, either on this list or freebsd-performance about
setting the default block size for PostgreSQL running on FreeBSD to be 16k
because of performance reasons. That is: *default* for the port, user is not
asked. So an automagical method to scale non-default block sizes is a very
needed thing.

 2) may not work well for anyone using unusual locales, optimization
 flags, or other non-default compile options except for language
 interfaces.

Depends on what you consider 'unusual'? I hope not things like iso8859-x
(or, to be exact, European languages) :)


--
Logic is a systematic method of coming to the wrong conclusion with
confidence.


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Sun performance - Major discovery!

2003-10-14 Thread Kenneth Marshall
Jeff,

My first concern with the -fast option is that it makes an executable
that is specific for the platform on which the compilation is run
unless other flags are given. My second concern is the effect it has
on IEEE floating point behavior w.r.t. rounding, error handling, 
And my third concern is that if you use -fast, all other code must
be compiled and linked with the -fast option for correct operation,
this includes any functional languages such as perl, python, R,...
That is a pretty big requirement for a default compilation flag.

Ken Marshall

On Thu, Oct 09, 2003 at 12:07:20PM -0400, Jeff wrote:
 On Thu, 9 Oct 2003, Bruce Momjian wrote:
 
   52 seconds to 19-20 seconds
 
  Wow, that's dramatic.  Do you want to propose some flags for non-gcc
  Solaris?  Is -fast the only one?  Is there one that suppresses those
  warnings or are they OK?
 
 
 Well. As I said, I didn't see an obvious way to hide those warnings.
 I'd love to make those warnings go away.  That is why I suggested perhaps
 printing a message to ensure the user knows that warnings may be printed
 when using sunsoft.
 
 -fast should be all you need - it picks the best settings to use for the
 platform that is doing the compile.
 
 
 --
 Jeff Trout [EMAIL PROTECTED]
 http://www.jefftrout.com/
 http://www.stuarthamm.net/
 
 
 
 ---(end of broadcast)---
 TIP 7: don't forget to increase your free space map settings

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] One or more processor ?

2003-10-14 Thread Relaxin

 Do you know of any RDBMS that actually will execute a single query on
 multiple processors?

SQL Server does in a sense.  It can split a query onto multiple threads
(which could possible use multiple processors) and then brings the results
from the threads into one and then sends the results to the client.




---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] go for a script! / ex: PostgreSQL vs. MySQL

2003-10-14 Thread Josh Berkus
Ivan,

 There was some talk, either on this list or freebsd-performance about
 setting the default block size for PostgreSQL running on FreeBSD to be 16k
 because of performance reasons. That is: *default* for the port, user is
 not asked. So an automagical method to scale non-default block sizes is a
 very needed thing.

Hmmm ... possibly.   My concern is that if someone uses a very non-default 
value, such as 256K, then they are probably better off doing their own tuning 
because they've got an unusual system.  However, we could easily limit it to 
the range of 4K to 32K.

Of course, since there's no GUC var, we'd have to ask the user to confirm 
their block size.  I'm reluctant to take this approach because if the user 
gets it wrong, then the settings will be *way* off ... and possibly cause 
PostgreSQL to be unrunnable or have out of memory crashes.

Unless there's a way to find it in the compiled source?

  2) may not work well for anyone using unusual locales, optimization
  flags, or other non-default compile options except for language
  interfaces.

 Depends on what you consider 'unusual'? I hope not things like iso8859-x
 (or, to be exact, European languages) :)

On second thought, I'm not sure what an unusual locale would be.  Scratch 
that.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Large Text Search Help

2003-10-14 Thread Josh Berkus
Mat,

 1. With tsearch2 I get very good query times up until I insert more
 records. For example with 100,000 records tsearch2 returns in around 6
 seconds, with 200,000 records tsearch2 returns in just under a minute.
 Is this due to the indices fitting entirely in memory with 100,000
 records?

Maybe, maybe not.  If you want a difinitive answer, post your EXPLAIN ANALYZE 
results with the original query.  

I assume that you have run VACUUM ANALYZE, first?  Don't bother to respond 
until you have.

 2. As well as whole word matching i also need to be able to do
 substring matching. Is the FTI module the way to approach this?

Yes.

 3. I have just begun to look into distibuted queries. Is there an
 existing solution for distibuting a postgresql database amongst
 multiple servers, so each has the same schema but only a subset of the
 total data?

No, it would be ad-hoc.  So far, Moore's law has prevented us from needing to 
devote serious effort to the above approach.

 Any other helpful comments or sugestions on how to improve query times
 using different hardware or software techniques would be appreciated.

Read the archives of this list.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] further testing on IDE drives

2003-10-14 Thread scott.marlowe
On Tue, 14 Oct 2003, Tom Lane wrote:

 scott.marlowe [EMAIL PROTECTED] writes:
  open_sync was WAY faster at this than the other two methods.
 
 Do you not have open_datasync?  That's the preferred method if
 available.

Nope, when I try to start postgresql with it set to that, I get this error 
message:

FATAL:  invalid value for wal_sync_method: open_datasync

This is on RedHat 9, but I have the same problem on a RH 7.2 box as well.


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] go for a script! / ex: PostgreSQL vs. MySQL

2003-10-14 Thread Tom Lane
Josh Berkus [EMAIL PROTECTED] writes:
 Unless there's a way to find it in the compiled source?

See pg_controldata.

regards, tom lane

---(end of broadcast)---
TIP 8: explain analyze is your friend


[PERFORM] free space map usage

2003-10-14 Thread Jeremy M. Guthrie
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Is there any way to determine how much of the free space map is currently in 
use?(ie. where and what it is tracking?)  I vacuum on a regular basis but I 
never hold in terms of disk space usage.  I jacked up the free space map 
pages but this doesn't appear to be working.

shared_buffers = 29400# 2*max_connections, min 16
max_fsm_relations = 1000# min 10, fsm is free space map
max_fsm_pages = 1000  # min 1000, fsm is free space map

- -- 
Jeremy M. Guthrie
Systems Engineer
Berbee
5520 Research Park Dr.
Madison, WI  53711
Phone:  608-298-1061

Berbee...Decade 1.  1993-2003
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQE/jCo4qtjaBHGZBeURAj9EAKCL+tiioPO5K1YM1sn62yS0L1Ry5QCfVifq
22s22gFNFHAHquS+iiUZO6s=
=AQ2Y
-END PGP SIGNATURE-
attachment: archive.png
---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] go for a script! / ex: PostgreSQL vs. MySQL

2003-10-14 Thread Sean Chittenden
  This would be parameters such as the block size and a few
  other compile time parameters. If we can get to some of these
  read-only parameters than that would make this step easier,
  certainly for the new recruits amongst us.
 
  Actually, from my perspective, we shouldn't bother with this; if an admin
  knows enough to set an alternate blaock size for PG, then they know
  enough to tweak the Conf file by hand.  I think we should just issue a
  warning that this script:
  1) does not work for anyone who is using non-default block sizes,
 
 There was some talk, either on this list or freebsd-performance
 about setting the default block size for PostgreSQL running on
 FreeBSD to be 16k because of performance reasons. That is: *default*
 for the port, user is not asked.

Real quick, this isn't true, the block size is tunable, but does not
change the default.  You can set PGBLOCKSIZE to the values 16K or
32K to change the block size, but the default remains 8K.

http://lists.freebsd.org/pipermail/freebsd-database/2003-October/000111.html

-sc

-- 
Sean Chittenden

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] free space map usage

2003-10-14 Thread Tom Lane
Jeremy M. Guthrie [EMAIL PROTECTED] writes:
 Is there any way to determine how much of the free space map is currently i=
 n=20
 use?(ie. where and what it is tracking?)  I vacuum on a regular basis but I=
 =20
 never hold in terms of disk space usage.

Not in 7.3 AFAIR.  In 7.4 a full-database VACUUM VERBOSE will end with
the info you want:

regression=# vacuum verbose;
... much cruft ...
INFO:  free space map: 11 relations, 144 pages stored; 272 total pages needed
DETAIL:  Allocated FSM size: 1000 relations + 2 pages = 178 kB shared memory.
VACUUM
regression=#

This tells me I'm only using about 1% of the FSM space (272 out of 2
page slots).

 I jacked up the free space map=20
 pages but this doesn't appear to be working.

You know you have to restart the postmaster to make those changes take
effect, right?

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Any issues with my tuning...

2003-10-14 Thread Neil Conway
On Mon, 2003-10-13 at 15:43, David Griffiths wrote: 
 Here are part of the contents of my sysctl.conf file (note that I've
 played with values as low as 60 with no difference)
 kernel.shmmax=14
 kernel.shmall=14

This is only a system-wide limit -- it either allows the shared memory
allocation to proceed, or it does not. Changing it will have no other
effect on the performance of PostgreSQL.

-  Index Scan using comm_ent_usr_acc_id_i on
 commercial_entity ce  (cost=0.00..4787.69 rows=78834 width=24) (actual
 time=0.02..55.64 rows=7991 loops=1)

Interesting that we get this row count estimate so completely wrong
(although it may or may not have anything to do with the actual
performance problem you're running into). Have you run ANALYZE on this
table recently? If so, does increasing this column's statistics target
(using ALTER TABLE ... ALTER COLUMN ... SET STATISTICS) improve the row
count estimate?

-Neil



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [SQL] [PERFORM] sql performance and cache

2003-10-14 Thread scott.marlowe
On Tue, 14 Oct 2003, Wei Weng wrote:

 On Sat, 11 Oct 2003, Christopher Kings-Lynne wrote:
 
  
   I have two very similar queries which I need to execute. They both have
   exactly the same from / where conditions. When I execute the first, it takes
   about 16 seconds. The second is executed almost immediately after, it takes
   13 seconds. In short, I'd like to know why the query result isn't being
   cached and any ideas on how to improve the execution.
  
  snip
  
   OK - so I could execute the query once, and get the maximum size of the
   array and the result set in one. I know what I am doing is less than optimal
   but I had expected the query results to be cached. So the second execution
   would be very quick. So why aren't they ? I have increased my cache size -
   shared_buffers is 2000 and I have doubled the default max_fsm... settings
   (although I am not sure what they do). sort_mem is 8192.
  
  PostgreSQL does not have, and has never had a query cache - so nothing 
  you do is going to make that second query faster.
  
  Perhaps you are confusing it with the MySQL query cache?
  
  Chris
  
 Is there plan on developing one (query cache)?

Not really, Postgresql's design makes it a bit of a non-winner.


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] sql performance and cache

2003-10-14 Thread Rod Taylor
  Perhaps you are confusing it with the MySQL query cache?

 Is there plan on developing one (query cache)?

For the most part, prepared queries and cursors give you a greater
advantage due to their versatility -- both of which we do have.

In the cases where an actual cache is useful, the client application
could do it just as easily or temp tables can be used.

I suspect it would be implemented more as a caching proxy than as an
actual part of PostgreSQL, should someone really want this feature.


signature.asc
Description: This is a digitally signed message part


Re: [PERFORM] backup/restore - another area.

2003-10-14 Thread markw
Jeff,

I'm curious to what kind of testing you've done with LVM.  I'm not
currently trying any backup/restore stuff, but I'm running our DBT-2
workload using LVM.  I've started collecting vmstat, iostat, and
readprofile data, initially running disktest to gauge the performance.

For anyone curious, I have some data on a 14-disk volume here:
http://developer.osdl.org/markw/lvm/results.4/log/

and a 52-disk volume here:
http://developer.osdl.org/markw/lvm/results.5/data/

Mark

Jeff [EMAIL PROTECTED] writes:

 Idea #1:
 Use an LVM and take a snapshop - archive that.
 From the way I see it. the downside is the LVM will use a lot of space
 until the snapshot is removed.  Also PG may be in a slightly inconsistant
 state - but this should appear to PG the same as if the power went out.
 
 For restore, simply unarchive this snapshot and point postgres at it. Let
 it recover and you are good to go.
 
 Little overhead from what I see...
 I'm leaning towards this method the more I think of it.

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] [PERFORM] Sun performance - Major discovery!

2003-10-14 Thread Peter Eisentraut
Marko Karppinen writes:

 GCC sets __FAST_MATH__ even if you counter a -ffast-math with the
 negating flags above. This means that it is not currently possible to
 use the -fast flag when compiling PostgreSQL at all. Instead, you have
 to go through all the flags Apple is setting and only pass on those
 that don't break pg.

That sounds perfectly reasonable to me.  Why should we develop elaborate
workarounds for compiler flags that are known to create broken code?  I
also want to point out that I'm getting kind of tired of developing more
and more workarounds for sloppy Apple engineering.

-- 
Peter Eisentraut   [EMAIL PROTECTED]


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] [PERFORM] Sun performance - Major discovery!

2003-10-14 Thread Marko Karppinen
On 8.10.2003, at 21:31, Bruce Momjian wrote:
Well, this is really embarassing.  I can't imagine why we would not set
at least -O on all platforms.  Looking at the template files, I see
these have no optimization set:

darwin
Regarding Darwin optimizations, Apple has introduced a -fast flag in 
their GCC 3.3 version that they recommend when compiling code for their 
new G5 systems. Because of this, I foresee a lot of people defining 
CFLAGS=-fast on their systems.

This is problematic for PostgreSQL, however, since the -fast flag is 
the equivalent of:

-O3 -falign-loops-max-skip=15 -falign-jumps-max-skip=15 
-falign-loops=16 -falign-jumps=16 -falign-functions=16 -malign-natural 
-ffast-math -fstrict-aliasing -frelax-aliasing -fgcse-mem-alias  
-funroll-loops -floop-transpose -floop-to-memset -finline-floor 
-mcpu=G5 -mpowerpc64 -mpowerpc-gpopt -mtune=G5 -fsched-interblock 
-fload-after-store  --param max-gcse-passes=3  -fno-gcse-sm  
-fgcse-loop-depth -funit-at-a-time  -fcallgraph-inlining  
-fdisable-typechecking-for-spec

At least the --fast-math part causes problems, seeing that PostgreSQL 
actually checks for the __FAST_MATH__ macro to make sure that it isn't 
turned on. There might be other problems with Apple's flags, but I 
think that the __FAST_MATH__ check should be altered.

As you know, setting --fast-math in GCC is the equivalent of setting 
-fno-math-errno, -funsafe-math-optimizations, -fno-trapping-math, 
-ffinite-math-only and -fno-signaling-nans. What really should be done, 
I think, is adding the opposites of these flags (-fmath-errno, 
-fno-unsafe-math-optimizations, -ftrapping_math, -fno-finite-math-only 
and -fsignaling-nans) to the command line if __FAST_MATH__ is detected. 
This would allow people to use CFLAGS=-fast on their G5s, beat some 
Xeon speed records, and not worry about esoteric IEEE math standards. 
What do you guys think?

GCC sets __FAST_MATH__ even if you counter a -ffast-math with the 
negating flags above. This means that it is not currently possible to 
use the -fast flag when compiling PostgreSQL at all. Instead, you have 
to go through all the flags Apple is setting and only pass on those 
that don't break pg.

mk

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster