Re: [PERFORM] Caching by Postgres

2005-08-26 Thread Thomas Ganss



The first, to always remember - is that the move from 64-bits to
32-bits doesn't come for free. In a real 64-bit system with a
64-bit operating system, and 64-bit applications, pointers are
now double their 32-bit size. This means more bytes to copy around
memory, and in an extreme case, has the potential to approach
halfing both the memory latency to access many such pointers from
RAM, and half the effective amount of RAM. In real world cases,
not everything is a pointer, so this sort of performance degradation
is doubtful - but it is something to keep in mind.

In addition to the above it lessens the effects of the CPU cache, so be 
sure to take the larger cached versions if you have structures needing 
to fit into the cache...


my 0.02 EUR

thomas

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Caching by Postgres

2005-08-24 Thread Donald Courtney



Great discussion and illuminating for those of us who are still
learning the subtleties of postGres.

William

To be clear -
I built postgreSQL 8.1 64K bit on solaris 10 a few months ago
and side by side with the 32 bit postgreSQL build saw no improvement. 
In fact the 64 bit result was slightly lower.


I used  the *same 64 bit S10 OS* for both versions.  I think your
experience makes sense since your change was from 32 to 64 bit Linux.
From my experiment I am surmising that there will not be any 
file/os/buffer-cache
scale up effect on the same OS with postgreSQL 64. 


I was testing on a 4 core system in both cases.



William Yu wrote:


Donald Courtney wrote:


in that even if you ran postgreSQL on a 64 bit address space
with larger number of CPUs you won't see much of a scale up
and possibly even a drop.   I am not alone in having the *expectation*



What's your basis for believing this is the case? Why would 
PostgreSQL's dependence on the OS's caching/filesystem limit 
scalability? I know when I went from 32bit to 64bit Linux, I got 
*HUGE* increases in performance using the same amount of memory. And 
when I went from 2x1P to 2xDC, my average cpu usage % dropped almost 
in half.



that a database should have some cache size parameter and
the option to skip the file system.   If I use oracle, sybase, mysql
and maxdb they all have the ability to size a data cache and move
to 64 bits.



Josh Berkus has already mentioned this as conventional wisdom as 
written by Oracle. This may also be legacy wisdom. Oracle/Sybase/etc 
has been around for a long time; it was probably a clear performance 
win way back when. Nowadays with how far open-source OS's have 
advanced, I'd take it with a grain of salt and do my own performance 
analysis. I suspect the big vendors wouldn't change their stance even 
if they knew it was no longer true due to the support hassles.


My personal experience with PostgreSQL. Dropping shared buffers from 
2GB to 750MB improved performance on my OLTP DB a good 25%. Going down 
from 750MB to 150MB was another +10%.


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly




---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Caching by Postgres

2005-08-24 Thread Stephen Frost
* Donald Courtney ([EMAIL PROTECTED]) wrote:
 To be clear -
 I built postgreSQL 8.1 64K bit on solaris 10 a few months ago
 and side by side with the 32 bit postgreSQL build saw no improvement. 
 In fact the 64 bit result was slightly lower.

That makes some sense actually.  It really depends on what you're doing
alot of the time.  On a Sparc system you're not likely to get much of a
speed improvment by going to 64bit (unless, maybe, you're doing lots of
intensive 64bit math ops).  You'll have larger pointers and whatnot
though.

 I used  the *same 64 bit S10 OS* for both versions.  I think your
 experience makes sense since your change was from 32 to 64 bit Linux.

32bit to 64bit Linux on a Sparc platform really shouldn't affect
performance all that much (I'd expect it to be similar to 32bit to 64bit
under Solaris actually, at least in terms of the performance
difference).  32bit to 64bit Linux on an amd64 platform is another
matter entirely though, but not because of the number of bits involved.

Under amd64, 32bit is limited to 32bit on i386 which has a limited
number of registers and whatnot.  Under amd64/64bit you get more
registers (and I think some other niceities) which will improve
performance.  That's not a 32bit vs. 64bit thing, that's i386 vs. native
amd64.  It's really mainly an oddity of the platform.  On a mips system
I'd expect the same kind of performance difference between 32bit and
64bit as you'd see on a sparc platform.

Enjoy,

Stephen


signature.asc
Description: Digital signature


Re: [PERFORM] Caching by Postgres

2005-08-24 Thread PFC


Really?  Cool, I'd like to see that.   Could you follow up with Hans?   
Or give

me his e-mail?


You can subscribe to the Reiser mailinglist on namesys.com or :
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Caching by Postgres

2005-08-24 Thread mark
On Wed, Aug 24, 2005 at 09:21:12AM -0400, Donald Courtney wrote:
 I built postgreSQL 8.1 64K bit on solaris 10 a few months ago
 and side by side with the 32 bit postgreSQL build saw no improvement. 
 In fact the 64 bit result was slightly lower.

I've had this sort of argument with a friend of mine who works at a
retail computer sales company who always tries to pitch 64-bit
platforms to me (I don't have one yet).

There are a few issues in here that are hard to properly detach to
allow for a fair comparison.

The first, to always remember - is that the move from 64-bits to
32-bits doesn't come for free. In a real 64-bit system with a
64-bit operating system, and 64-bit applications, pointers are
now double their 32-bit size. This means more bytes to copy around
memory, and in an extreme case, has the potential to approach
halfing both the memory latency to access many such pointers from
RAM, and half the effective amount of RAM. In real world cases,
not everything is a pointer, so this sort of performance degradation
is doubtful - but it is something to keep in mind.

In response to this, it appears that, at least on the Intel/AMD side
of things, they've increased the bandwidth on the motherboard, and
allowed for faster memory to be connected to the motherboard. They've
increased the complexity of the chip, to allow 64-bit register
operations to be equivalent in speed to 32-bit register operations.
I have no idea what else they've done... :-)

So, it may be difficult to properly compare a 32-bit system to a
64-bit system. Even if the Ghz on the chip appears equal, it isn't
the same chip, and unless it is the exact same make, product and
version of the motherboard, it may not be a fair compairson. Turning
support for 32-bit on or off, and using a kernel that is only 32-bit
may give good comparisons - but with the above explanation, I would
expect the 32-bit application + kernel to out-perform the 64-bit
application.

So then we move on to what 64-bit is really useful for. Obviously,
there is the arithmetic. If you were previously doing 64-bit
arithmetic through software, you will notice an immediate speed
improvement when doing it through hardware instead. If you have
a program that is scanning memory in any way, it may benefit from
64-bit instructions (for example - copying data 64-bit words at
a time instead of 32-bit words at a time). PostgreSQL might benefit
slightly from either of these, slightly balancing the performance
degradation of using more memory to store the pointers, and more
memory bandwidth the access the pointers.

The real benefit of 64-bit is address space. From the kernel
perspective, it means that more programs, or bigger programs can run
at once. From the application perspective, it means your application
can use more than 32-bits of address space. For programs that make
extensive use of mmap(), this can be a necessity. They are mapping
very large files into their own address space. This isn't a
performance boost, as much as it is a 'you can't do it', if the
files mmap()'ed at the same time, will not fit within 32-bits of
address space. This also becomes, potentially, a performance
degradation, as the system is now having to manage applications
that have very large page tables. Page faults may become
expensive.

PostgreSQL uses read(), instead of mmap(), and uses 2 Gbyte files.
PostgreSQL doesn't require the additional address space for normal
operation.

If, however, you happen to have a very large amount of physical memory
- more memory than is supported by a 32-bit system, but is supported
by your 64-bit system, then the operating system should be able to use
this additional physical memory to cache file system data pages, which
will benefit PostgreSQL if used with tables that are larger than the
memory supported by your 32-bit system, and which have queries which
require more pages than the memory supported by your 32-bit system to
be frequently accessed. If you have a huge database, with many clients
accessing the data, this would be a definate yes. With anything less,
it is a maybe, or a probably not.

I've been looking at switching to 64-bit, mostly to benefit from the
better motherboard bandwidth, and just to play around. I'm not
expecting to require the 64-bit instructions.

Hope this helps,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Caching by Postgres

2005-08-24 Thread William Yu

Donald Courtney wrote:

I built postgreSQL 8.1 64K bit on solaris 10 a few months ago
and side by side with the 32 bit postgreSQL build saw no improvement. In 
fact the 64 bit result was slightly lower.


I'm not surprised 32-bit binaries running on a 64-bit OS would be faster 
than 64-bit/64-bit. 64-bit isn't some magical wand you wave and it's all 
ok. Programs compiled as 64-bit will only run faster if (1) you need 
64-bit address space and you've been using ugly hacks like PAE to get 
access to memory  2GB or (2) you need native 64-bit data types and 
you've been using ugly hacks to piece 32-bit ints together (example, 
encryption/compression). In most cases, 64-bit will run slightly slower 
due to extra overhead of using larger datatypes.


Since PostgreSQL hands off the majority of memory management/data 
caching to the OS, only the OS needs to be 64-bit to reap the benefits 
of better memory management. Since Postgres *ALREADY* reaps the 64-bit 
benefit, I'm not sure how the argument moving caching/mm/fs into 
Postgres would apply. Yes there's the point about possibly implementing 
better/smarter/more appropriate caching algorithms but that has nothing 
to do with 64-bit.


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Caching by Postgres

2005-08-24 Thread Alan Stange

[EMAIL PROTECTED] wrote:

So then we move on to what 64-bit is really useful for. Obviously,
there is the arithmetic. If you were previously doing 64-bit
arithmetic through software, you will notice an immediate speed
improvement when doing it through hardware instead. If you have
a program that is scanning memory in any way, it may benefit from
64-bit instructions (for example - copying data 64-bit words at
a time instead of 32-bit words at a time). PostgreSQL might benefit
slightly from either of these, slightly balancing the performance
degradation of using more memory to store the pointers, and more
memory bandwidth the access the pointers.
  
At least on Sparc processors, v8 and newer, any double precision math 
(including longs) is performed with a single instruction, just like for 
a 32 bit datum.  Loads and stores of 8 byte datums are also handled via 
a single instruction.   The urban myth that 64bit math is 
different/better on a 64 bit processor is just that;  yes, some lower 
end processors would emulate/trap those instructions but that an 
implementation detail, not architecture.I believe that this is all 
true for other RISC processors as well.


The 64bit API on UltraSparcs does bring along some extra FP registers IIRC.


If, however, you happen to have a very large amount of physical memory
- more memory than is supported by a 32-bit system, but is supported
by your 64-bit system, then the operating system should be able to use
this additional physical memory to cache file system data pages, which
will benefit PostgreSQL if used with tables that are larger than the
memory supported by your 32-bit system, and which have queries which
require more pages than the memory supported by your 32-bit system to
be frequently accessed. If you have a huge database, with many clients
accessing the data, this would be a definate yes. With anything less,
it is a maybe, or a probably not.
  
Solaris, at least, provided support for far more than 4GB of physical 
memory on 32 bit kernels.  A newer 64 bit kernel might be more 
efficient, but that's just because the time was taken to support large 
page sizes and more efficient data structures.  It's nothing intrinsic 
to a 32 vs 64 bit kernel.


-- Alan

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] Caching by Postgres

2005-08-24 Thread mark
On Wed, Aug 24, 2005 at 02:47:09PM -0400, Alan Stange wrote:
 At least on Sparc processors, v8 and newer, any double precision math 
 (including longs) is performed with a single instruction, just like for 
 a 32 bit datum.  Loads and stores of 8 byte datums are also handled via 
 a single instruction.   The urban myth that 64bit math is 
 different/better on a 64 bit processor is just that;  yes, some lower 
 end processors would emulate/trap those instructions but that an 
 implementation detail, not architecture.

It isn't an urban myth that 64-bit math on a 64-bit processor is
faster, at least if done using registers. It definately is faster.

It may be an urban myth, though, that most applications perform
a sufficient amount of 64-bit arithmetic to warrant the upgrade.
The benefit may be lost in the noise for an application such as
PostgreSQL. It takes, effectively, infinately longer to access
a disk page, than to increment a 64-bit integer in software.

For the lower end processors that emulate/trap these instructions,
they are being performed in software, along with the overhead of a
trap, and are therefore not a single instruction any more. We are
coming at this from different sides (which is good - perspective is
always good :-) ). From the Intel/AMD side of things, ALL non 64-bit
platforms are 'lower end processors', and don't emulate/trap the
instructions as they didn't exist (at least not yet - who knows what
clever and sufficiently motivated people will do :-) ).

 If, however, you happen to have a very large amount of physical memory
 - more memory than is supported by a 32-bit system, but is supported
 by your 64-bit system, then the operating system should be able to use
 this additional physical memory to cache file system data pages, which
 will benefit PostgreSQL if used with tables that are larger than the
 memory supported by your 32-bit system, and which have queries which
 require more pages than the memory supported by your 32-bit system to
 be frequently accessed. If you have a huge database, with many clients
 accessing the data, this would be a definate yes. With anything less,
 it is a maybe, or a probably not.
 Solaris, at least, provided support for far more than 4GB of physical 
 memory on 32 bit kernels.  A newer 64 bit kernel might be more 
 efficient, but that's just because the time was taken to support large 
 page sizes and more efficient data structures.  It's nothing intrinsic 
 to a 32 vs 64 bit kernel.

Hehe. That's why I was so careful to qualify my statements. :-)

But yeah, I agree. It's a lot of hype, for not much gain (and some
loss, depending on what it is being used for). I only want one because
they're built better, and because I want to play around.

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Caching by Postgres

2005-08-24 Thread PFC



At least on Sparc processors, v8 and newer, any double precision math  
(including longs) is performed with a single instruction, just like for  
a 32 bit datum.  Loads and stores of 8 byte datums are also handled via  
a single instruction.   The urban myth that 64bit math is  
different/better on a 64 bit processor is just that;  yes, some lower  
end processors would emulate/trap those instructions but that an  
implementation detail, not architecture.I believe that this is all  
true for other RISC processors as well.


The 64bit API on UltraSparcs does bring along some extra FP registers  
IIRC.


It's very different on x86.
	64-bit x86 like the Opteron has more registers, which are very scarce on  
the base x86 (8 I think). This alone is very important. There are other  
factors as well.


Solaris, at least, provided support for far more than 4GB of physical  
memory on 32 bit kernels.  A newer 64 bit kernel might be more  
efficient, but that's just because the time was taken to support large  
page sizes and more efficient data structures.  It's nothing intrinsic  
to a 32 vs 64 bit kernel.


	Well, on a large working set, a processor which can directly address more  
than 4GB of memory will be a lot faster than one which can't, and has to  
play with the MMU and paging units !


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Caching by Postgres

2005-08-24 Thread Michael Stone

On Wed, Aug 24, 2005 at 03:34:41PM -0400, [EMAIL PROTECTED] wrote:

It isn't an urban myth that 64-bit math on a 64-bit processor is
faster, at least if done using registers. It definately is faster.
It may be an urban myth, though, that most applications perform
a sufficient amount of 64-bit arithmetic to warrant the upgrade.


The mjor problem is that the definition of 64bit processor is fuzzy.
The major slowdown of 64bitness is the necessity of carting around 
64 bit pointers. It's not, however, necessary to do 64bit pointers to

get 64bit registers  fast 64 bit ops. E.g., sgi has n32  n64 abi's
which can access exactly the same instruction set  registers, the
difference between them is the size of pointers and whether a long is
the same as a long long. Any discussion of 64 bit processors is
doomed from the start because people tend to start making implementation
assumptions on top of an already vague concept. Current  future
discussions are tinged by the fact that amd64 *doubles* the number
of registers in 64 bit mode, potentially providing a major speedup--but
one that doesn't really have anything to do with being 64bit. 
Pretty much any discussion of 64 bit mode really needs to be a

discussion of a particular abi on a particular processor; talking about
64 bit processors abstractly is a waste of time.

Mike Stone

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Caching by Postgres

2005-08-24 Thread mark
On Wed, Aug 24, 2005 at 05:09:04PM -0400, Alan Stange wrote:
 The older 32bit RISC processors do have 64 bit registers, ALUs and 
 datapaths, and they are marketed toward high end scientific computing, 
 and you're claiming that such a processor is slower than one which has 
 the addition of 64 bit pointers added to it?

No. I'm claiming that you are talking about a hybrid 64/32 processor,
and that this isn't fair to declare that 64-bit arithmetic units don't
provide benefit for 64-bit math. :-)

 As an example, an UltraSparc running a 32 bit kernel+application will 
 have the same double precision floating point performance as one 
 running  a 64bit kernel+application (except for the additional FP 
 registers in the 64bit API).  For a function like daxpy, it's the exact 
 same hardware running the exact same instructions!  So why do you think 
 the performance would be different?

Double precision floating point isn't 64-bit integer arithmetic. I think
this is all a little besides the point. If you point is that the SPARC
was designed well - I agree with you.

I won't agree that a SPARC with 64-bit registers should be considered
a 32-bit machine. The AMD 64-bit machines come in two forms as well -
the ones that allow you to use the 64-bit integer registers (not
floating point! those are already 80-bit!), and the ones that allow
you to address more memory. I wouldn't consider either to be a 32-bit
CPU, although they will allow 32-bit applications to run fine.

 I believe the IBM Power processors also upped everything to double 
 precision internally because of some details of the multiply-add fused 
 instructions.  It's been a few years since I taught HP to CS 
 undergrads, but I'm fairly sure the details are all the same for MIPS 
 processors as well. 

Smart design, that obscures the difference - but doesn't make the
difference a myth. If it's already there, then it's already there,
and we can't talk as if it isn't.

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Caching by Postgres

2005-08-23 Thread John A Meinel
gokulnathbabu manoharan wrote:
 Hi all,

 I like to know the caching policies of Postgresql.
 What parameter in the postgresql.conf affects the
 cache size used by the Postgresql?  As far as I have
 searched my knowledge of the parameters are

In general, you don't. The OS handles caching based on file usage.
So if you are using the files, the OS should cache them. Just like it
does with any other program.


 1. shared_buffers - Sets the limit on the amount of
 shared memory used.  If I take this is as the cache
 size then my performance should increase with the
 increase in the size of shared_buffers.  But it seems
 it is not the case and my performance actually
 decreases with the increase in the shared_buffers.  I
 have a RAM size of 32 GB.  The table which I use more
 frequently has around 68 million rows.  Can I cache
 this entire table in RAM?

There is a portion of this which is used for caching. But I believe
before 8.1 there was code that went linearly through all of the
shared_buffers and checked for dirty/clean pages. So there was a
tradeoff that the bigger you make it, the longer that search goes. So
you got diminishing returns, generally around 10k shared buffers.
I think it is better in 8.1, but if the OS is going to cache it anyway
(since it does), then having a Postgres cache is just wasting memory,
and not letting cache as much.

So I'm guessing that with 8.1 there would be 2 sweet spots. Low
shared_buffers (= 10k), and really high shared buffers (like all of
available ram).
But because postgres has been tuned for the former I would stick with it
(I don't think shared_buffers can go 2GB, but that might just be
work_mem/maintenance_work_mem).


 2. work_mem - It is the amount of memory used by an
 operation.  My guess is once the operation is complete
 this is freed and hence has nothing to do with the
 caching.

 3. effective_cache_size - The parameter used by the
 query planner and has nothing to do with the actual
 caching.

This is important from a planner issue. Because the planner can then
expect that the OS is doing its job and caching the tables, so index
scans are cheaper than they would be otherwise.

John
=:-


 So kindly help me in pointing me to the correct
 parameter to set.

 It will be great if you can point me to the docs that
 explains the implementation of caching in Postgresql
 which will help me in understanding things much
 clearly.

 Thanks in advance.
 Gokul.




signature.asc
Description: OpenPGP digital signature


Re: [PERFORM] Caching by Postgres

2005-08-23 Thread Josh Berkus
John,

 So I'm guessing that with 8.1 there would be 2 sweet spots. Low
 shared_buffers (= 10k), and really high shared buffers (like all of
 available ram).
 But because postgres has been tuned for the former I would stick with it
 (I don't think shared_buffers can go 2GB, but that might just be
 work_mem/maintenance_work_mem).

I'll be testing this as soon as we get some issues with the 64bit 
shared_buffer patch worked out.

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Caching by Postgres

2005-08-23 Thread Bruno Wolff III
On Tue, Aug 23, 2005 at 10:10:45 -0700,
  gokulnathbabu manoharan [EMAIL PROTECTED] wrote:
 Hi all,
 
 I like to know the caching policies of Postgresql. 
 What parameter in the postgresql.conf affects the
 cache size used by the Postgresql?  As far as I have
 searched my knowledge of the parameters are

The main policy is to let the OS do most of the caching.

 1. shared_buffers - Sets the limit on the amount of
 shared memory used.  If I take this is as the cache
 size then my performance should increase with the
 increase in the size of shared_buffers.  But it seems
 it is not the case and my performance actually
 decreases with the increase in the shared_buffers.  I
 have a RAM size of 32 GB.  The table which I use more
 frequently has around 68 million rows.  Can I cache
 this entire table in RAM?

Using extermely large values for shared buffers is known to be a performance
loss for Postgres. Some improvements were made for 8.0 and more for 8.1.

The OS will cache frequently used data from files for you. So if you are using
that table a lot and the rows aren't too wide, it should mostly be cached
for you by the OS.

 2. work_mem - It is the amount of memory used by an
 operation.  My guess is once the operation is complete
 this is freed and hence has nothing to do with the
 caching.

This is used for sorts and some other things.

 3. effective_cache_size - The parameter used by the
 query planner and has nothing to do with the actual
 caching.

You are supposed to use this to give the planner an idea about how much
space the OS will using for caching on behalf of Posgres.

 So kindly help me in pointing me to the correct
 parameter to set.
 
 It will be great if you can point me to the docs that
 explains the implementation of caching in Postgresql
 which will help me in understanding things much
 clearly.

You probably want to read the following:
http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Caching by Postgres

2005-08-23 Thread Donald Courtney


I mean well with this comment - 
This whole issue of data caching is  a troubling issue with postreSQL

in that even if you ran postgreSQL on a 64 bit address space
with larger number of CPUs you won't see much of a scale up
and possibly even a drop.   I am not alone in having the *expectation*
that a database should have some cache size parameter and
the option to skip the file system.   If I use oracle, sybase, mysql
and maxdb they all have the ability to size a data cache and move
to 64 bits.

Is this a crazy idea - that a project be started to get this adopted?  
Is it
too big and structural to contemplate? 


From one who likes postgreSQL
dc

Frank Wiles wrote:


On Tue, 23 Aug 2005 10:10:45 -0700 (PDT)
gokulnathbabu manoharan [EMAIL PROTECTED] wrote:

 


Hi all,

I like to know the caching policies of Postgresql. 
What parameter in the postgresql.conf affects the

cache size used by the Postgresql?  As far as I have
searched my knowledge of the parameters are

1. shared_buffers - Sets the limit on the amount of
shared memory used.  If I take this is as the cache
size then my performance should increase with the
increase in the size of shared_buffers.  But it seems
it is not the case and my performance actually
decreases with the increase in the shared_buffers.  I
have a RAM size of 32 GB.  The table which I use more
frequently has around 68 million rows.  Can I cache
this entire table in RAM?
   



 increasing shared_buffers to a point helps, but after
 a certain threshold it can actually degree performance. 

 


2. work_mem - It is the amount of memory used by an
operation.  My guess is once the operation is complete
this is freed and hence has nothing to do with the
caching.
   



 This is the amount of memory used for things like sorts and
 order bys on a per backend process basis. 

 


3. effective_cache_size - The parameter used by the
query planner and has nothing to do with the actual
caching.
   



 The instructs the query planner on how large the operating
 system's disk cache is.  There isn't a built in cache, PostgreSQL
 relies on the operating system to cache the on disk information
 based on how often it is used.  In most cases this is probably
 more accurate anyway. 


 I wrote an article on PostgreSQL performance tuning that has
 links to several other related sites, you can find it here: 


 http://www.revsys.com/writings/postgresql-performance.html

-
  Frank Wiles [EMAIL PROTECTED]
  http://www.wiles.org
-


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match
 




---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Caching by Postgres

2005-08-23 Thread Josh Berkus
Donald,

 This whole issue of data caching is  a troubling issue with postreSQL
 in that even if you ran postgreSQL on a 64 bit address space
 with larger number of CPUs you won't see much of a scale up
 and possibly even a drop. 

Since when?   Barring the context switch bug, you're not going to get a 
drop with more processors/more RAM.

You may fail to get any gain, though.  If your database is only 100MB in 
size, having 11G of cache space isn't going to help you much over having 
only 1G.   

 I am not alone in having the *expectation* 
 that a database should have some cache size parameter and
 the option to skip the file system.  

Sure, because that's the conventional wisdom, as writ by Oracle.  However, 
this comes with substantial code maintenance costs and portability 
limitations which have to be measured against any gain in performance.  

 If I use oracle, sybase, mysql 
 and maxdb they all have the ability to size a data cache and move
 to 64 bits.

And yet, we regularly outperform Sybase and MySQL on heavy OLTP loads on 
commodity x86 hardware.So apparently DB caching isn't everything.  ;-)

I'm not saying that it's not worth testing larger database caches -- even 
taking over most of RAM -- on high-speed systems.   In fact, I'm working 
on doing that kind of test now.  However, barring test results, we can't 
assume that taking over RAM and the FS cache would have a substantial 
performance benefit; that remains to be shown.

The other thing is that we've had, and continue to have, low-hanging fruit 
which have a clear and measurable effect on performance and are fixable 
without bloating the PG code.  Some of these issues (COPY path, context 
switching, locks, GiST concurrency, some aggregates) have been addressed 
in the 8.1 code; some remain to be addressed (sorts, disk spill, 64-bit 
sort mem, other aggregates, index-only access, etc.).   Why tackle a huge, 
250-hour project which could fail when a 20-hour patch is more likely to 
provide the same performance benefit?  

We have the same discussion (annually) about mmap.  Using mmap *might* 
provide us with a huge performance boost.  However, it would *definitely* 
require 300hours (or more) of programmer time to test properly, and might 
not benefit us at all.

Of course, if *you* want to work on large database cache improvements, be 
my guest ... it's an open source project!  Submit your patches!  I'll be 
happy to test them.

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Caching by Postgres

2005-08-23 Thread mark
On Tue, Aug 23, 2005 at 02:41:39PM -0400, Donald Courtney wrote:
 I mean well with this comment - 
 This whole issue of data caching is  a troubling issue with postreSQL
 in that even if you ran postgreSQL on a 64 bit address space
 with larger number of CPUs you won't see much of a scale up
 and possibly even a drop.   I am not alone in having the *expectation*
 that a database should have some cache size parameter and
 the option to skip the file system.   If I use oracle, sybase, mysql
 and maxdb they all have the ability to size a data cache and move
 to 64 bits.
 Is this a crazy idea - that a project be started to get this adopted?  
 Is it
 too big and structural to contemplate? 
 From one who likes postgreSQL

Hey Donald. :-)

This is an operating system issue, not a PostgreSQL issue. If you have
more physical memory than fits in 32-bit addresses, and your operating
system isn't using this extra memory to cache files (or anything
else), than your OS is what I would consider to be broken (or at the
very least, not designed for a 64-bit host).

The only questions that can be asked here is - 1) can PostgreSQL do a
better job than the OS at best utilizing system RAM, and 2) if so, is
the net gain worth the added complexity to PostgreSQL?

I happen to think that yes, PostgreSQL can do a better job than most
OS's, as it has better information to make decisions as to which pages
are worth keeping, and which are not, but no, it isn't worth the
effort until PostgreSQL developers start running out of things to do.

Buy your 64-bit platforms - but if page caching is your concern, 1)
ensure that you really have more physical memory than can fit in 32
bits, and 2) ensure that your operating system is comfortable caching
data pages from files above the 32-bit mark.

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Caching by Postgres

2005-08-23 Thread Michael Stone

On Tue, Aug 23, 2005 at 12:38:04PM -0700, Josh Berkus wrote:
which have a clear and measurable effect on performance and are fixable 
without bloating the PG code.  Some of these issues (COPY path, context 
switching


Does that include increasing the size of read/write blocks? I've noticed
that with a large enough table it takes a while to do a sequential scan,
even if it's cached; I wonder if the fact that it takes a million
read(2) calls to get through an 8G table is part of that.

Mike Stone

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Caching by Postgres

2005-08-23 Thread Chris Browne
[EMAIL PROTECTED] (Donald Courtney) writes:
 I mean well with this comment -
 This whole issue of data caching is  a troubling issue with postreSQL
 in that even if you ran postgreSQL on a 64 bit address space
 with larger number of CPUs you won't see much of a scale up
 and possibly even a drop.   I am not alone in having the *expectation*
 that a database should have some cache size parameter and
 the option to skip the file system.   If I use oracle, sybase, mysql
 and maxdb they all have the ability to size a data cache and move
 to 64 bits.

 Is this a crazy idea - that a project be started to get this
 adopted?  Is it too big and structural to contemplate?

This project amounts to Implement Your Own Operating System, because
it requires that the DBMS take over the things that operating systems
normally do, like:
 a) Managing access to filesystems and
 b) Managing memory

The world is already sufficiently filled up with numerous variations
of Linux, BSD 4.4 Lite, and UNIX System V; I can't see justification for
reinventing this wheel still again.
-- 
(format nil [EMAIL PROTECTED] cbbrowne acm.org)
http://cbbrowne.com/info/multiplexor.html
Rules of  the Evil Overlord #196.  I will hire an  expert marksman to
stand by the entrance to my  fortress. His job will be to shoot anyone
who rides up to challenge me.  http://www.eviloverlord.com/

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Caching by Postgres

2005-08-23 Thread Chris Browne
[EMAIL PROTECTED] (Michael Stone) writes:
 On Tue, Aug 23, 2005 at 12:38:04PM -0700, Josh Berkus wrote:
 which have a clear and measurable effect on performance and are
 fixable without bloating the PG code.  Some of these issues (COPY
 path, context switching

 Does that include increasing the size of read/write blocks? I've
 noticed that with a large enough table it takes a while to do a
 sequential scan, even if it's cached; I wonder if the fact that it
 takes a million read(2) calls to get through an 8G table is part of
 that.

But behind the scenes, the OS is still going to have to evaluate the
is this in cache? question for each and every one of those pages.
(Assuming the kernel's page size is 8K; if it's smaller, the number of
evaluations will be even higher...)

Grouping the read(2) calls together isn't going to have any impact on
_that_ evaluation.
-- 
let name=cbbrowne and tld=ntlug.org in name ^ @ ^ tld;;
http://www3.sympatico.ca/cbbrowne/finances.html
People who don't use computers are more sociable, reasonable, and ...
less twisted -- Arthur Norman

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Caching by Postgres

2005-08-23 Thread William Yu

Donald Courtney wrote:

in that even if you ran postgreSQL on a 64 bit address space
with larger number of CPUs you won't see much of a scale up
and possibly even a drop.   I am not alone in having the *expectation*


What's your basis for believing this is the case? Why would PostgreSQL's 
dependence on the OS's caching/filesystem limit scalability? I know when 
I went from 32bit to 64bit Linux, I got *HUGE* increases in performance 
using the same amount of memory. And when I went from 2x1P to 2xDC, my 
average cpu usage % dropped almost in half.



that a database should have some cache size parameter and
the option to skip the file system.   If I use oracle, sybase, mysql
and maxdb they all have the ability to size a data cache and move
to 64 bits.


Josh Berkus has already mentioned this as conventional wisdom as written 
by Oracle. This may also be legacy wisdom. Oracle/Sybase/etc has been 
around for a long time; it was probably a clear performance win way back 
when. Nowadays with how far open-source OS's have advanced, I'd take it 
with a grain of salt and do my own performance analysis. I suspect the 
big vendors wouldn't change their stance even if they knew it was no 
longer true due to the support hassles.


My personal experience with PostgreSQL. Dropping shared buffers from 2GB 
to 750MB improved performance on my OLTP DB a good 25%. Going down from 
750MB to 150MB was another +10%.


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Caching by Postgres

2005-08-23 Thread PFC


Josh Berkus has already mentioned this as conventional wisdom as written  
by Oracle. This may also be legacy wisdom. Oracle/Sybase/etc has been  
around for a long time; it was probably a clear performance win way back  
when. Nowadays with how far open-source OS's have advanced, I'd take it  
with a grain of salt and do my own performance analysis. I suspect the  
big vendors wouldn't change their stance even if they knew it was no  
longer true due to the support hassles.


Reinvent a filesystem... that would be suicidal.

	Now, Hans Reiser has expressed interest on the ReiserFS list in tweaking  
his Reiser4 especially for Postgres. In his own words, he wants a Killer  
app for reiser4. Reiser4 will offser transactional semantics via a  
special reiser4 syscall, so it might be possible, with a minimum of  
changes to postgres (ie maybe just another sync mode besides fsync,  
fdatasync et al) to use this. Other interesting details were exposed on  
the reiser list, too (ie. a transactional filesystems can give ACID  
guarantees to postgres without the need for fsync()).


Very interesting.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Caching by Postgres

2005-08-23 Thread Josh Berkus
PFC,

 Now, Hans Reiser has expressed interest on the ReiserFS list in
 tweaking   his Reiser4 especially for Postgres. In his own words, he wants
 a Killer app for reiser4. Reiser4 will offser transactional semantics via
 a special reiser4 syscall, so it might be possible, with a minimum of
 changes to postgres (ie maybe just another sync mode besides fsync,
 fdatasync et al) to use this. Other interesting details were exposed on the
 reiser list, too (ie. a transactional filesystems can give ACID guarantees
 to postgres without the need for fsync()).

Really?  Cool, I'd like to see that.   Could you follow up with Hans?  Or give 
me his e-mail?

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Caching by Postgres

2005-08-23 Thread Gavin Sherry
On Wed, 24 Aug 2005, PFC wrote:


  Josh Berkus has already mentioned this as conventional wisdom as written
  by Oracle. This may also be legacy wisdom. Oracle/Sybase/etc has been
  around for a long time; it was probably a clear performance win way back
  when. Nowadays with how far open-source OS's have advanced, I'd take it
  with a grain of salt and do my own performance analysis. I suspect the
  big vendors wouldn't change their stance even if they knew it was no
  longer true due to the support hassles.

   Reinvent a filesystem... that would be suicidal.

   Now, Hans Reiser has expressed interest on the ReiserFS list in tweaking
 his Reiser4 especially for Postgres. In his own words, he wants a Killer
 app for reiser4. Reiser4 will offser transactional semantics via a
 special reiser4 syscall, so it might be possible, with a minimum of
 changes to postgres (ie maybe just another sync mode besides fsync,
 fdatasync et al) to use this. Other interesting details were exposed on
 the reiser list, too (ie. a transactional filesystems can give ACID
 guarantees to postgres without the need for fsync()).

   Very interesting.

Ummm... I don't see anything here which will be a win for Postgres. The
transactional semantics we're interested in are fairly complex:

1) Modifications to multiple objects can become visible to the system
atomically
2) On error, a series of modifications which had been grouped together
within a transaction can be rolled back
3) Using object version information, determine which version of which
object is visible to a given session
4) Using version information and locking, detect and resolve read/write
and write/write conflicts

Now, I can see a file system offering (1) and (2). But a file system that
can allow people to do (3) and (4) would require that we make *major*
modifications to how postgresql is implemented. More over, it would be for
no gain, since we've already written a system which can do it.

A filesystem could, in theory, help us by providing an API which allows us
to tell the file system either: the way we'd like it to read ahead, the
fact that we don't want it to read ahead or the way we'd like it to cache
(or not cache) data. The thing is, most OSes provide interfaces to do this
already and we make only little use of them (I'm think of
madv_sequential(), madv_random(), POSIX fadvise(), the various flags to
open() which AIX, HPUX, Solaris provide).

Gavin

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Caching by Postgres

2005-08-23 Thread Tom Lane
Gavin Sherry [EMAIL PROTECTED] writes:
 A filesystem could, in theory, help us by providing an API which allows us
 to tell the file system either: the way we'd like it to read ahead, the
 fact that we don't want it to read ahead or the way we'd like it to cache
 (or not cache) data. The thing is, most OSes provide interfaces to do this
 already and we make only little use of them (I'm think of
 madv_sequential(), madv_random(), POSIX fadvise(), the various flags to
 open() which AIX, HPUX, Solaris provide).

Yeah ... the main reason we've not spent too much time on that sort of
stuff is that *it's not portable*.  And with all due respect to Hans,
special tweaks for one filesystem are even less interesting than special
tweaks for one OS.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Caching by Postgres

2005-08-23 Thread Gavin Sherry
On Wed, 24 Aug 2005, Tom Lane wrote:

 Gavin Sherry [EMAIL PROTECTED] writes:
  A filesystem could, in theory, help us by providing an API which allows us
  to tell the file system either: the way we'd like it to read ahead, the
  fact that we don't want it to read ahead or the way we'd like it to cache
  (or not cache) data. The thing is, most OSes provide interfaces to do this
  already and we make only little use of them (I'm think of
  madv_sequential(), madv_random(), POSIX fadvise(), the various flags to
  open() which AIX, HPUX, Solaris provide).

 Yeah ... the main reason we've not spent too much time on that sort of
 stuff is that *it's not portable*.  And with all due respect to Hans,
 special tweaks for one filesystem are even less interesting than special
 tweaks for one OS.

Right.

As an aside, it seems to me that if there is merit in all this low level
interaction with the file system (not to mention the other platform
specific microoptimisations which come up regularly on the lists) then the
companies currently producing niche commercial releases of PostgreSQL
should be taking advantage of them: if it increases performance, then
there's a reason to buy as opposed to just downloading the OSS version.

Gavin

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings