Re: [PERFORM] Caching by Postgres
The first, to always remember - is that the move from 64-bits to 32-bits doesn't come for free. In a real 64-bit system with a 64-bit operating system, and 64-bit applications, pointers are now double their 32-bit size. This means more bytes to copy around memory, and in an extreme case, has the potential to approach halfing both the memory latency to access many such pointers from RAM, and half the effective amount of RAM. In real world cases, not everything is a pointer, so this sort of performance degradation is doubtful - but it is something to keep in mind. In addition to the above it lessens the effects of the CPU cache, so be sure to take the larger cached versions if you have structures needing to fit into the cache... my 0.02 EUR thomas ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Caching by Postgres
Great discussion and illuminating for those of us who are still learning the subtleties of postGres. William To be clear - I built postgreSQL 8.1 64K bit on solaris 10 a few months ago and side by side with the 32 bit postgreSQL build saw no improvement. In fact the 64 bit result was slightly lower. I used the *same 64 bit S10 OS* for both versions. I think your experience makes sense since your change was from 32 to 64 bit Linux. From my experiment I am surmising that there will not be any file/os/buffer-cache scale up effect on the same OS with postgreSQL 64. I was testing on a 4 core system in both cases. William Yu wrote: Donald Courtney wrote: in that even if you ran postgreSQL on a 64 bit address space with larger number of CPUs you won't see much of a scale up and possibly even a drop. I am not alone in having the *expectation* What's your basis for believing this is the case? Why would PostgreSQL's dependence on the OS's caching/filesystem limit scalability? I know when I went from 32bit to 64bit Linux, I got *HUGE* increases in performance using the same amount of memory. And when I went from 2x1P to 2xDC, my average cpu usage % dropped almost in half. that a database should have some cache size parameter and the option to skip the file system. If I use oracle, sybase, mysql and maxdb they all have the ability to size a data cache and move to 64 bits. Josh Berkus has already mentioned this as conventional wisdom as written by Oracle. This may also be legacy wisdom. Oracle/Sybase/etc has been around for a long time; it was probably a clear performance win way back when. Nowadays with how far open-source OS's have advanced, I'd take it with a grain of salt and do my own performance analysis. I suspect the big vendors wouldn't change their stance even if they knew it was no longer true due to the support hassles. My personal experience with PostgreSQL. Dropping shared buffers from 2GB to 750MB improved performance on my OLTP DB a good 25%. Going down from 750MB to 150MB was another +10%. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Caching by Postgres
* Donald Courtney ([EMAIL PROTECTED]) wrote: To be clear - I built postgreSQL 8.1 64K bit on solaris 10 a few months ago and side by side with the 32 bit postgreSQL build saw no improvement. In fact the 64 bit result was slightly lower. That makes some sense actually. It really depends on what you're doing alot of the time. On a Sparc system you're not likely to get much of a speed improvment by going to 64bit (unless, maybe, you're doing lots of intensive 64bit math ops). You'll have larger pointers and whatnot though. I used the *same 64 bit S10 OS* for both versions. I think your experience makes sense since your change was from 32 to 64 bit Linux. 32bit to 64bit Linux on a Sparc platform really shouldn't affect performance all that much (I'd expect it to be similar to 32bit to 64bit under Solaris actually, at least in terms of the performance difference). 32bit to 64bit Linux on an amd64 platform is another matter entirely though, but not because of the number of bits involved. Under amd64, 32bit is limited to 32bit on i386 which has a limited number of registers and whatnot. Under amd64/64bit you get more registers (and I think some other niceities) which will improve performance. That's not a 32bit vs. 64bit thing, that's i386 vs. native amd64. It's really mainly an oddity of the platform. On a mips system I'd expect the same kind of performance difference between 32bit and 64bit as you'd see on a sparc platform. Enjoy, Stephen signature.asc Description: Digital signature
Re: [PERFORM] Caching by Postgres
Really? Cool, I'd like to see that. Could you follow up with Hans? Or give me his e-mail? You can subscribe to the Reiser mailinglist on namesys.com or : [EMAIL PROTECTED] ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Caching by Postgres
On Wed, Aug 24, 2005 at 09:21:12AM -0400, Donald Courtney wrote: I built postgreSQL 8.1 64K bit on solaris 10 a few months ago and side by side with the 32 bit postgreSQL build saw no improvement. In fact the 64 bit result was slightly lower. I've had this sort of argument with a friend of mine who works at a retail computer sales company who always tries to pitch 64-bit platforms to me (I don't have one yet). There are a few issues in here that are hard to properly detach to allow for a fair comparison. The first, to always remember - is that the move from 64-bits to 32-bits doesn't come for free. In a real 64-bit system with a 64-bit operating system, and 64-bit applications, pointers are now double their 32-bit size. This means more bytes to copy around memory, and in an extreme case, has the potential to approach halfing both the memory latency to access many such pointers from RAM, and half the effective amount of RAM. In real world cases, not everything is a pointer, so this sort of performance degradation is doubtful - but it is something to keep in mind. In response to this, it appears that, at least on the Intel/AMD side of things, they've increased the bandwidth on the motherboard, and allowed for faster memory to be connected to the motherboard. They've increased the complexity of the chip, to allow 64-bit register operations to be equivalent in speed to 32-bit register operations. I have no idea what else they've done... :-) So, it may be difficult to properly compare a 32-bit system to a 64-bit system. Even if the Ghz on the chip appears equal, it isn't the same chip, and unless it is the exact same make, product and version of the motherboard, it may not be a fair compairson. Turning support for 32-bit on or off, and using a kernel that is only 32-bit may give good comparisons - but with the above explanation, I would expect the 32-bit application + kernel to out-perform the 64-bit application. So then we move on to what 64-bit is really useful for. Obviously, there is the arithmetic. If you were previously doing 64-bit arithmetic through software, you will notice an immediate speed improvement when doing it through hardware instead. If you have a program that is scanning memory in any way, it may benefit from 64-bit instructions (for example - copying data 64-bit words at a time instead of 32-bit words at a time). PostgreSQL might benefit slightly from either of these, slightly balancing the performance degradation of using more memory to store the pointers, and more memory bandwidth the access the pointers. The real benefit of 64-bit is address space. From the kernel perspective, it means that more programs, or bigger programs can run at once. From the application perspective, it means your application can use more than 32-bits of address space. For programs that make extensive use of mmap(), this can be a necessity. They are mapping very large files into their own address space. This isn't a performance boost, as much as it is a 'you can't do it', if the files mmap()'ed at the same time, will not fit within 32-bits of address space. This also becomes, potentially, a performance degradation, as the system is now having to manage applications that have very large page tables. Page faults may become expensive. PostgreSQL uses read(), instead of mmap(), and uses 2 Gbyte files. PostgreSQL doesn't require the additional address space for normal operation. If, however, you happen to have a very large amount of physical memory - more memory than is supported by a 32-bit system, but is supported by your 64-bit system, then the operating system should be able to use this additional physical memory to cache file system data pages, which will benefit PostgreSQL if used with tables that are larger than the memory supported by your 32-bit system, and which have queries which require more pages than the memory supported by your 32-bit system to be frequently accessed. If you have a huge database, with many clients accessing the data, this would be a definate yes. With anything less, it is a maybe, or a probably not. I've been looking at switching to 64-bit, mostly to benefit from the better motherboard bandwidth, and just to play around. I'm not expecting to require the 64-bit instructions. Hope this helps, mark -- [EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] __ . . _ ._ . . .__. . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/|_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/ ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Caching by Postgres
Donald Courtney wrote: I built postgreSQL 8.1 64K bit on solaris 10 a few months ago and side by side with the 32 bit postgreSQL build saw no improvement. In fact the 64 bit result was slightly lower. I'm not surprised 32-bit binaries running on a 64-bit OS would be faster than 64-bit/64-bit. 64-bit isn't some magical wand you wave and it's all ok. Programs compiled as 64-bit will only run faster if (1) you need 64-bit address space and you've been using ugly hacks like PAE to get access to memory 2GB or (2) you need native 64-bit data types and you've been using ugly hacks to piece 32-bit ints together (example, encryption/compression). In most cases, 64-bit will run slightly slower due to extra overhead of using larger datatypes. Since PostgreSQL hands off the majority of memory management/data caching to the OS, only the OS needs to be 64-bit to reap the benefits of better memory management. Since Postgres *ALREADY* reaps the 64-bit benefit, I'm not sure how the argument moving caching/mm/fs into Postgres would apply. Yes there's the point about possibly implementing better/smarter/more appropriate caching algorithms but that has nothing to do with 64-bit. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Caching by Postgres
[EMAIL PROTECTED] wrote: So then we move on to what 64-bit is really useful for. Obviously, there is the arithmetic. If you were previously doing 64-bit arithmetic through software, you will notice an immediate speed improvement when doing it through hardware instead. If you have a program that is scanning memory in any way, it may benefit from 64-bit instructions (for example - copying data 64-bit words at a time instead of 32-bit words at a time). PostgreSQL might benefit slightly from either of these, slightly balancing the performance degradation of using more memory to store the pointers, and more memory bandwidth the access the pointers. At least on Sparc processors, v8 and newer, any double precision math (including longs) is performed with a single instruction, just like for a 32 bit datum. Loads and stores of 8 byte datums are also handled via a single instruction. The urban myth that 64bit math is different/better on a 64 bit processor is just that; yes, some lower end processors would emulate/trap those instructions but that an implementation detail, not architecture.I believe that this is all true for other RISC processors as well. The 64bit API on UltraSparcs does bring along some extra FP registers IIRC. If, however, you happen to have a very large amount of physical memory - more memory than is supported by a 32-bit system, but is supported by your 64-bit system, then the operating system should be able to use this additional physical memory to cache file system data pages, which will benefit PostgreSQL if used with tables that are larger than the memory supported by your 32-bit system, and which have queries which require more pages than the memory supported by your 32-bit system to be frequently accessed. If you have a huge database, with many clients accessing the data, this would be a definate yes. With anything less, it is a maybe, or a probably not. Solaris, at least, provided support for far more than 4GB of physical memory on 32 bit kernels. A newer 64 bit kernel might be more efficient, but that's just because the time was taken to support large page sizes and more efficient data structures. It's nothing intrinsic to a 32 vs 64 bit kernel. -- Alan ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Caching by Postgres
On Wed, Aug 24, 2005 at 02:47:09PM -0400, Alan Stange wrote: At least on Sparc processors, v8 and newer, any double precision math (including longs) is performed with a single instruction, just like for a 32 bit datum. Loads and stores of 8 byte datums are also handled via a single instruction. The urban myth that 64bit math is different/better on a 64 bit processor is just that; yes, some lower end processors would emulate/trap those instructions but that an implementation detail, not architecture. It isn't an urban myth that 64-bit math on a 64-bit processor is faster, at least if done using registers. It definately is faster. It may be an urban myth, though, that most applications perform a sufficient amount of 64-bit arithmetic to warrant the upgrade. The benefit may be lost in the noise for an application such as PostgreSQL. It takes, effectively, infinately longer to access a disk page, than to increment a 64-bit integer in software. For the lower end processors that emulate/trap these instructions, they are being performed in software, along with the overhead of a trap, and are therefore not a single instruction any more. We are coming at this from different sides (which is good - perspective is always good :-) ). From the Intel/AMD side of things, ALL non 64-bit platforms are 'lower end processors', and don't emulate/trap the instructions as they didn't exist (at least not yet - who knows what clever and sufficiently motivated people will do :-) ). If, however, you happen to have a very large amount of physical memory - more memory than is supported by a 32-bit system, but is supported by your 64-bit system, then the operating system should be able to use this additional physical memory to cache file system data pages, which will benefit PostgreSQL if used with tables that are larger than the memory supported by your 32-bit system, and which have queries which require more pages than the memory supported by your 32-bit system to be frequently accessed. If you have a huge database, with many clients accessing the data, this would be a definate yes. With anything less, it is a maybe, or a probably not. Solaris, at least, provided support for far more than 4GB of physical memory on 32 bit kernels. A newer 64 bit kernel might be more efficient, but that's just because the time was taken to support large page sizes and more efficient data structures. It's nothing intrinsic to a 32 vs 64 bit kernel. Hehe. That's why I was so careful to qualify my statements. :-) But yeah, I agree. It's a lot of hype, for not much gain (and some loss, depending on what it is being used for). I only want one because they're built better, and because I want to play around. Cheers, mark -- [EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] __ . . _ ._ . . .__. . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/|_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/ ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Caching by Postgres
At least on Sparc processors, v8 and newer, any double precision math (including longs) is performed with a single instruction, just like for a 32 bit datum. Loads and stores of 8 byte datums are also handled via a single instruction. The urban myth that 64bit math is different/better on a 64 bit processor is just that; yes, some lower end processors would emulate/trap those instructions but that an implementation detail, not architecture.I believe that this is all true for other RISC processors as well. The 64bit API on UltraSparcs does bring along some extra FP registers IIRC. It's very different on x86. 64-bit x86 like the Opteron has more registers, which are very scarce on the base x86 (8 I think). This alone is very important. There are other factors as well. Solaris, at least, provided support for far more than 4GB of physical memory on 32 bit kernels. A newer 64 bit kernel might be more efficient, but that's just because the time was taken to support large page sizes and more efficient data structures. It's nothing intrinsic to a 32 vs 64 bit kernel. Well, on a large working set, a processor which can directly address more than 4GB of memory will be a lot faster than one which can't, and has to play with the MMU and paging units ! ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Caching by Postgres
On Wed, Aug 24, 2005 at 03:34:41PM -0400, [EMAIL PROTECTED] wrote: It isn't an urban myth that 64-bit math on a 64-bit processor is faster, at least if done using registers. It definately is faster. It may be an urban myth, though, that most applications perform a sufficient amount of 64-bit arithmetic to warrant the upgrade. The mjor problem is that the definition of 64bit processor is fuzzy. The major slowdown of 64bitness is the necessity of carting around 64 bit pointers. It's not, however, necessary to do 64bit pointers to get 64bit registers fast 64 bit ops. E.g., sgi has n32 n64 abi's which can access exactly the same instruction set registers, the difference between them is the size of pointers and whether a long is the same as a long long. Any discussion of 64 bit processors is doomed from the start because people tend to start making implementation assumptions on top of an already vague concept. Current future discussions are tinged by the fact that amd64 *doubles* the number of registers in 64 bit mode, potentially providing a major speedup--but one that doesn't really have anything to do with being 64bit. Pretty much any discussion of 64 bit mode really needs to be a discussion of a particular abi on a particular processor; talking about 64 bit processors abstractly is a waste of time. Mike Stone ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Caching by Postgres
On Wed, Aug 24, 2005 at 05:09:04PM -0400, Alan Stange wrote: The older 32bit RISC processors do have 64 bit registers, ALUs and datapaths, and they are marketed toward high end scientific computing, and you're claiming that such a processor is slower than one which has the addition of 64 bit pointers added to it? No. I'm claiming that you are talking about a hybrid 64/32 processor, and that this isn't fair to declare that 64-bit arithmetic units don't provide benefit for 64-bit math. :-) As an example, an UltraSparc running a 32 bit kernel+application will have the same double precision floating point performance as one running a 64bit kernel+application (except for the additional FP registers in the 64bit API). For a function like daxpy, it's the exact same hardware running the exact same instructions! So why do you think the performance would be different? Double precision floating point isn't 64-bit integer arithmetic. I think this is all a little besides the point. If you point is that the SPARC was designed well - I agree with you. I won't agree that a SPARC with 64-bit registers should be considered a 32-bit machine. The AMD 64-bit machines come in two forms as well - the ones that allow you to use the 64-bit integer registers (not floating point! those are already 80-bit!), and the ones that allow you to address more memory. I wouldn't consider either to be a 32-bit CPU, although they will allow 32-bit applications to run fine. I believe the IBM Power processors also upped everything to double precision internally because of some details of the multiply-add fused instructions. It's been a few years since I taught HP to CS undergrads, but I'm fairly sure the details are all the same for MIPS processors as well. Smart design, that obscures the difference - but doesn't make the difference a myth. If it's already there, then it's already there, and we can't talk as if it isn't. Cheers, mark -- [EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] __ . . _ ._ . . .__. . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/|_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/ ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Caching by Postgres
gokulnathbabu manoharan wrote: Hi all, I like to know the caching policies of Postgresql. What parameter in the postgresql.conf affects the cache size used by the Postgresql? As far as I have searched my knowledge of the parameters are In general, you don't. The OS handles caching based on file usage. So if you are using the files, the OS should cache them. Just like it does with any other program. 1. shared_buffers - Sets the limit on the amount of shared memory used. If I take this is as the cache size then my performance should increase with the increase in the size of shared_buffers. But it seems it is not the case and my performance actually decreases with the increase in the shared_buffers. I have a RAM size of 32 GB. The table which I use more frequently has around 68 million rows. Can I cache this entire table in RAM? There is a portion of this which is used for caching. But I believe before 8.1 there was code that went linearly through all of the shared_buffers and checked for dirty/clean pages. So there was a tradeoff that the bigger you make it, the longer that search goes. So you got diminishing returns, generally around 10k shared buffers. I think it is better in 8.1, but if the OS is going to cache it anyway (since it does), then having a Postgres cache is just wasting memory, and not letting cache as much. So I'm guessing that with 8.1 there would be 2 sweet spots. Low shared_buffers (= 10k), and really high shared buffers (like all of available ram). But because postgres has been tuned for the former I would stick with it (I don't think shared_buffers can go 2GB, but that might just be work_mem/maintenance_work_mem). 2. work_mem - It is the amount of memory used by an operation. My guess is once the operation is complete this is freed and hence has nothing to do with the caching. 3. effective_cache_size - The parameter used by the query planner and has nothing to do with the actual caching. This is important from a planner issue. Because the planner can then expect that the OS is doing its job and caching the tables, so index scans are cheaper than they would be otherwise. John =:- So kindly help me in pointing me to the correct parameter to set. It will be great if you can point me to the docs that explains the implementation of caching in Postgresql which will help me in understanding things much clearly. Thanks in advance. Gokul. signature.asc Description: OpenPGP digital signature
Re: [PERFORM] Caching by Postgres
John, So I'm guessing that with 8.1 there would be 2 sweet spots. Low shared_buffers (= 10k), and really high shared buffers (like all of available ram). But because postgres has been tuned for the former I would stick with it (I don't think shared_buffers can go 2GB, but that might just be work_mem/maintenance_work_mem). I'll be testing this as soon as we get some issues with the 64bit shared_buffer patch worked out. -- --Josh Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Caching by Postgres
On Tue, Aug 23, 2005 at 10:10:45 -0700, gokulnathbabu manoharan [EMAIL PROTECTED] wrote: Hi all, I like to know the caching policies of Postgresql. What parameter in the postgresql.conf affects the cache size used by the Postgresql? As far as I have searched my knowledge of the parameters are The main policy is to let the OS do most of the caching. 1. shared_buffers - Sets the limit on the amount of shared memory used. If I take this is as the cache size then my performance should increase with the increase in the size of shared_buffers. But it seems it is not the case and my performance actually decreases with the increase in the shared_buffers. I have a RAM size of 32 GB. The table which I use more frequently has around 68 million rows. Can I cache this entire table in RAM? Using extermely large values for shared buffers is known to be a performance loss for Postgres. Some improvements were made for 8.0 and more for 8.1. The OS will cache frequently used data from files for you. So if you are using that table a lot and the rows aren't too wide, it should mostly be cached for you by the OS. 2. work_mem - It is the amount of memory used by an operation. My guess is once the operation is complete this is freed and hence has nothing to do with the caching. This is used for sorts and some other things. 3. effective_cache_size - The parameter used by the query planner and has nothing to do with the actual caching. You are supposed to use this to give the planner an idea about how much space the OS will using for caching on behalf of Posgres. So kindly help me in pointing me to the correct parameter to set. It will be great if you can point me to the docs that explains the implementation of caching in Postgresql which will help me in understanding things much clearly. You probably want to read the following: http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Caching by Postgres
I mean well with this comment - This whole issue of data caching is a troubling issue with postreSQL in that even if you ran postgreSQL on a 64 bit address space with larger number of CPUs you won't see much of a scale up and possibly even a drop. I am not alone in having the *expectation* that a database should have some cache size parameter and the option to skip the file system. If I use oracle, sybase, mysql and maxdb they all have the ability to size a data cache and move to 64 bits. Is this a crazy idea - that a project be started to get this adopted? Is it too big and structural to contemplate? From one who likes postgreSQL dc Frank Wiles wrote: On Tue, 23 Aug 2005 10:10:45 -0700 (PDT) gokulnathbabu manoharan [EMAIL PROTECTED] wrote: Hi all, I like to know the caching policies of Postgresql. What parameter in the postgresql.conf affects the cache size used by the Postgresql? As far as I have searched my knowledge of the parameters are 1. shared_buffers - Sets the limit on the amount of shared memory used. If I take this is as the cache size then my performance should increase with the increase in the size of shared_buffers. But it seems it is not the case and my performance actually decreases with the increase in the shared_buffers. I have a RAM size of 32 GB. The table which I use more frequently has around 68 million rows. Can I cache this entire table in RAM? increasing shared_buffers to a point helps, but after a certain threshold it can actually degree performance. 2. work_mem - It is the amount of memory used by an operation. My guess is once the operation is complete this is freed and hence has nothing to do with the caching. This is the amount of memory used for things like sorts and order bys on a per backend process basis. 3. effective_cache_size - The parameter used by the query planner and has nothing to do with the actual caching. The instructs the query planner on how large the operating system's disk cache is. There isn't a built in cache, PostgreSQL relies on the operating system to cache the on disk information based on how often it is used. In most cases this is probably more accurate anyway. I wrote an article on PostgreSQL performance tuning that has links to several other related sites, you can find it here: http://www.revsys.com/writings/postgresql-performance.html - Frank Wiles [EMAIL PROTECTED] http://www.wiles.org - ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Caching by Postgres
Donald, This whole issue of data caching is a troubling issue with postreSQL in that even if you ran postgreSQL on a 64 bit address space with larger number of CPUs you won't see much of a scale up and possibly even a drop. Since when? Barring the context switch bug, you're not going to get a drop with more processors/more RAM. You may fail to get any gain, though. If your database is only 100MB in size, having 11G of cache space isn't going to help you much over having only 1G. I am not alone in having the *expectation* that a database should have some cache size parameter and the option to skip the file system. Sure, because that's the conventional wisdom, as writ by Oracle. However, this comes with substantial code maintenance costs and portability limitations which have to be measured against any gain in performance. If I use oracle, sybase, mysql and maxdb they all have the ability to size a data cache and move to 64 bits. And yet, we regularly outperform Sybase and MySQL on heavy OLTP loads on commodity x86 hardware.So apparently DB caching isn't everything. ;-) I'm not saying that it's not worth testing larger database caches -- even taking over most of RAM -- on high-speed systems. In fact, I'm working on doing that kind of test now. However, barring test results, we can't assume that taking over RAM and the FS cache would have a substantial performance benefit; that remains to be shown. The other thing is that we've had, and continue to have, low-hanging fruit which have a clear and measurable effect on performance and are fixable without bloating the PG code. Some of these issues (COPY path, context switching, locks, GiST concurrency, some aggregates) have been addressed in the 8.1 code; some remain to be addressed (sorts, disk spill, 64-bit sort mem, other aggregates, index-only access, etc.). Why tackle a huge, 250-hour project which could fail when a 20-hour patch is more likely to provide the same performance benefit? We have the same discussion (annually) about mmap. Using mmap *might* provide us with a huge performance boost. However, it would *definitely* require 300hours (or more) of programmer time to test properly, and might not benefit us at all. Of course, if *you* want to work on large database cache improvements, be my guest ... it's an open source project! Submit your patches! I'll be happy to test them. -- --Josh Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Caching by Postgres
On Tue, Aug 23, 2005 at 02:41:39PM -0400, Donald Courtney wrote: I mean well with this comment - This whole issue of data caching is a troubling issue with postreSQL in that even if you ran postgreSQL on a 64 bit address space with larger number of CPUs you won't see much of a scale up and possibly even a drop. I am not alone in having the *expectation* that a database should have some cache size parameter and the option to skip the file system. If I use oracle, sybase, mysql and maxdb they all have the ability to size a data cache and move to 64 bits. Is this a crazy idea - that a project be started to get this adopted? Is it too big and structural to contemplate? From one who likes postgreSQL Hey Donald. :-) This is an operating system issue, not a PostgreSQL issue. If you have more physical memory than fits in 32-bit addresses, and your operating system isn't using this extra memory to cache files (or anything else), than your OS is what I would consider to be broken (or at the very least, not designed for a 64-bit host). The only questions that can be asked here is - 1) can PostgreSQL do a better job than the OS at best utilizing system RAM, and 2) if so, is the net gain worth the added complexity to PostgreSQL? I happen to think that yes, PostgreSQL can do a better job than most OS's, as it has better information to make decisions as to which pages are worth keeping, and which are not, but no, it isn't worth the effort until PostgreSQL developers start running out of things to do. Buy your 64-bit platforms - but if page caching is your concern, 1) ensure that you really have more physical memory than can fit in 32 bits, and 2) ensure that your operating system is comfortable caching data pages from files above the 32-bit mark. Cheers, mark -- [EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] __ . . _ ._ . . .__. . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/|_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/ ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Caching by Postgres
On Tue, Aug 23, 2005 at 12:38:04PM -0700, Josh Berkus wrote: which have a clear and measurable effect on performance and are fixable without bloating the PG code. Some of these issues (COPY path, context switching Does that include increasing the size of read/write blocks? I've noticed that with a large enough table it takes a while to do a sequential scan, even if it's cached; I wonder if the fact that it takes a million read(2) calls to get through an 8G table is part of that. Mike Stone ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Caching by Postgres
[EMAIL PROTECTED] (Donald Courtney) writes: I mean well with this comment - This whole issue of data caching is a troubling issue with postreSQL in that even if you ran postgreSQL on a 64 bit address space with larger number of CPUs you won't see much of a scale up and possibly even a drop. I am not alone in having the *expectation* that a database should have some cache size parameter and the option to skip the file system. If I use oracle, sybase, mysql and maxdb they all have the ability to size a data cache and move to 64 bits. Is this a crazy idea - that a project be started to get this adopted? Is it too big and structural to contemplate? This project amounts to Implement Your Own Operating System, because it requires that the DBMS take over the things that operating systems normally do, like: a) Managing access to filesystems and b) Managing memory The world is already sufficiently filled up with numerous variations of Linux, BSD 4.4 Lite, and UNIX System V; I can't see justification for reinventing this wheel still again. -- (format nil [EMAIL PROTECTED] cbbrowne acm.org) http://cbbrowne.com/info/multiplexor.html Rules of the Evil Overlord #196. I will hire an expert marksman to stand by the entrance to my fortress. His job will be to shoot anyone who rides up to challenge me. http://www.eviloverlord.com/ ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Caching by Postgres
[EMAIL PROTECTED] (Michael Stone) writes: On Tue, Aug 23, 2005 at 12:38:04PM -0700, Josh Berkus wrote: which have a clear and measurable effect on performance and are fixable without bloating the PG code. Some of these issues (COPY path, context switching Does that include increasing the size of read/write blocks? I've noticed that with a large enough table it takes a while to do a sequential scan, even if it's cached; I wonder if the fact that it takes a million read(2) calls to get through an 8G table is part of that. But behind the scenes, the OS is still going to have to evaluate the is this in cache? question for each and every one of those pages. (Assuming the kernel's page size is 8K; if it's smaller, the number of evaluations will be even higher...) Grouping the read(2) calls together isn't going to have any impact on _that_ evaluation. -- let name=cbbrowne and tld=ntlug.org in name ^ @ ^ tld;; http://www3.sympatico.ca/cbbrowne/finances.html People who don't use computers are more sociable, reasonable, and ... less twisted -- Arthur Norman ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Caching by Postgres
Donald Courtney wrote: in that even if you ran postgreSQL on a 64 bit address space with larger number of CPUs you won't see much of a scale up and possibly even a drop. I am not alone in having the *expectation* What's your basis for believing this is the case? Why would PostgreSQL's dependence on the OS's caching/filesystem limit scalability? I know when I went from 32bit to 64bit Linux, I got *HUGE* increases in performance using the same amount of memory. And when I went from 2x1P to 2xDC, my average cpu usage % dropped almost in half. that a database should have some cache size parameter and the option to skip the file system. If I use oracle, sybase, mysql and maxdb they all have the ability to size a data cache and move to 64 bits. Josh Berkus has already mentioned this as conventional wisdom as written by Oracle. This may also be legacy wisdom. Oracle/Sybase/etc has been around for a long time; it was probably a clear performance win way back when. Nowadays with how far open-source OS's have advanced, I'd take it with a grain of salt and do my own performance analysis. I suspect the big vendors wouldn't change their stance even if they knew it was no longer true due to the support hassles. My personal experience with PostgreSQL. Dropping shared buffers from 2GB to 750MB improved performance on my OLTP DB a good 25%. Going down from 750MB to 150MB was another +10%. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Caching by Postgres
Josh Berkus has already mentioned this as conventional wisdom as written by Oracle. This may also be legacy wisdom. Oracle/Sybase/etc has been around for a long time; it was probably a clear performance win way back when. Nowadays with how far open-source OS's have advanced, I'd take it with a grain of salt and do my own performance analysis. I suspect the big vendors wouldn't change their stance even if they knew it was no longer true due to the support hassles. Reinvent a filesystem... that would be suicidal. Now, Hans Reiser has expressed interest on the ReiserFS list in tweaking his Reiser4 especially for Postgres. In his own words, he wants a Killer app for reiser4. Reiser4 will offser transactional semantics via a special reiser4 syscall, so it might be possible, with a minimum of changes to postgres (ie maybe just another sync mode besides fsync, fdatasync et al) to use this. Other interesting details were exposed on the reiser list, too (ie. a transactional filesystems can give ACID guarantees to postgres without the need for fsync()). Very interesting. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Caching by Postgres
PFC, Now, Hans Reiser has expressed interest on the ReiserFS list in tweaking his Reiser4 especially for Postgres. In his own words, he wants a Killer app for reiser4. Reiser4 will offser transactional semantics via a special reiser4 syscall, so it might be possible, with a minimum of changes to postgres (ie maybe just another sync mode besides fsync, fdatasync et al) to use this. Other interesting details were exposed on the reiser list, too (ie. a transactional filesystems can give ACID guarantees to postgres without the need for fsync()). Really? Cool, I'd like to see that. Could you follow up with Hans? Or give me his e-mail? -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Caching by Postgres
On Wed, 24 Aug 2005, PFC wrote: Josh Berkus has already mentioned this as conventional wisdom as written by Oracle. This may also be legacy wisdom. Oracle/Sybase/etc has been around for a long time; it was probably a clear performance win way back when. Nowadays with how far open-source OS's have advanced, I'd take it with a grain of salt and do my own performance analysis. I suspect the big vendors wouldn't change their stance even if they knew it was no longer true due to the support hassles. Reinvent a filesystem... that would be suicidal. Now, Hans Reiser has expressed interest on the ReiserFS list in tweaking his Reiser4 especially for Postgres. In his own words, he wants a Killer app for reiser4. Reiser4 will offser transactional semantics via a special reiser4 syscall, so it might be possible, with a minimum of changes to postgres (ie maybe just another sync mode besides fsync, fdatasync et al) to use this. Other interesting details were exposed on the reiser list, too (ie. a transactional filesystems can give ACID guarantees to postgres without the need for fsync()). Very interesting. Ummm... I don't see anything here which will be a win for Postgres. The transactional semantics we're interested in are fairly complex: 1) Modifications to multiple objects can become visible to the system atomically 2) On error, a series of modifications which had been grouped together within a transaction can be rolled back 3) Using object version information, determine which version of which object is visible to a given session 4) Using version information and locking, detect and resolve read/write and write/write conflicts Now, I can see a file system offering (1) and (2). But a file system that can allow people to do (3) and (4) would require that we make *major* modifications to how postgresql is implemented. More over, it would be for no gain, since we've already written a system which can do it. A filesystem could, in theory, help us by providing an API which allows us to tell the file system either: the way we'd like it to read ahead, the fact that we don't want it to read ahead or the way we'd like it to cache (or not cache) data. The thing is, most OSes provide interfaces to do this already and we make only little use of them (I'm think of madv_sequential(), madv_random(), POSIX fadvise(), the various flags to open() which AIX, HPUX, Solaris provide). Gavin ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Caching by Postgres
Gavin Sherry [EMAIL PROTECTED] writes: A filesystem could, in theory, help us by providing an API which allows us to tell the file system either: the way we'd like it to read ahead, the fact that we don't want it to read ahead or the way we'd like it to cache (or not cache) data. The thing is, most OSes provide interfaces to do this already and we make only little use of them (I'm think of madv_sequential(), madv_random(), POSIX fadvise(), the various flags to open() which AIX, HPUX, Solaris provide). Yeah ... the main reason we've not spent too much time on that sort of stuff is that *it's not portable*. And with all due respect to Hans, special tweaks for one filesystem are even less interesting than special tweaks for one OS. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Caching by Postgres
On Wed, 24 Aug 2005, Tom Lane wrote: Gavin Sherry [EMAIL PROTECTED] writes: A filesystem could, in theory, help us by providing an API which allows us to tell the file system either: the way we'd like it to read ahead, the fact that we don't want it to read ahead or the way we'd like it to cache (or not cache) data. The thing is, most OSes provide interfaces to do this already and we make only little use of them (I'm think of madv_sequential(), madv_random(), POSIX fadvise(), the various flags to open() which AIX, HPUX, Solaris provide). Yeah ... the main reason we've not spent too much time on that sort of stuff is that *it's not portable*. And with all due respect to Hans, special tweaks for one filesystem are even less interesting than special tweaks for one OS. Right. As an aside, it seems to me that if there is merit in all this low level interaction with the file system (not to mention the other platform specific microoptimisations which come up regularly on the lists) then the companies currently producing niche commercial releases of PostgreSQL should be taking advantage of them: if it increases performance, then there's a reason to buy as opposed to just downloading the OSS version. Gavin ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings