[HACKERS] pgbash-7.3 released
I'm pleased to announce the release of pgbash-7.3. http://www.psn.co.jp/PostgreSQL/pgbash/index-e.html ChangeLog: 1. Fix a bug of CVS form in the (pgbash original)copy command. 2. Update 'pgbashrc' for PostgreSQL-7.3. 3. Pgbash version number was changed into the same number as PostgreSQL. -- SAKAIDA Masaaki ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] PGP signing releases
Well said. I'm glad someone else is willing to take a stab at addressing these issues, since I've been down with the flu. Thanks Greg. As both Gregs have pointed out, hashes and checksums alone should only be used as an integrity check. It is not a viable security mechanism. A hash does not provide for authentication and even more importantly, verification of authentication. These concepts are key to creating a secure environment. Regards, -- Greg Copeland [EMAIL PROTECTED] Copeland Computer Consulting On Mon, 2003-02-10 at 21:57, [EMAIL PROTECTED] wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 So you put the MD5 sum into the release announcement email. That is downloaded by many people and also archived in many distributed places that we don't control, so it would be very hard to tamper with. ISTM that this gives you the same result as a PGP signature but with much less administrative overhead. Not the same results. For one thing, the mailing announcement may be archived on google, but asking people to search google for an MD5 sum as they download the tarball is hardly feasible. Second, it still does not prevent someone from breaking into the server and replacing the tarball with their own version, and their own MD5 checksum. Or maybe just one of the mirrors. Users are not going to know to compare that MD5 with versions on the web somewhere. Third, is does not allow a positive history to be built up due to signing many releases over time. With PGP, someone can be assured that the 9.1 tarball they just downloaded was signed by the same key that signed the 7.3 tarball they've been using for 2 years. Fourth, only with PGP can you trace your key to the one that signed the tarball, an additional level of security. MD5 provides an integrity check only. Any security it affords (such as storing the MD5 sum elsewhere) is trivial and should not be considered when using PGP is standard, easy to implement, and has none of MD5s weaknesses. - -- Greg Sabino Mullane [EMAIL PROTECTED] PGP Key: 0x14964AC8 200302102250 -BEGIN PGP SIGNATURE- Comment: http://www.turnstep.com/pgp.html iD8DBQE+SA4AvJuQZxSWSsgRAhenAKDu0vlUBC5Eodyt2OxTG6el++BJZACguR2i GGLAzhtA7Tt9w4RUYXY4g2U= =3ryu -END PGP SIGNATURE- ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[HACKERS] PostgreSQL Benchmarks
Hrm. I just saw that the PHP ADODB guy just published a bunch of database benchmarks. It's fairly evident to me that benchmarking PostgreSQL on Win32 isn't really fair: http://php.weblogs.com/oracle_mysql_performance *sigh* Chris ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [HACKERS] PostgreSQL Benchmarks
Hrm. I just saw that the PHP ADODB guy just published a bunch of database benchmarks. It's fairly evident to me that benchmarking PostgreSQL on Win32 isn't really fair: http://php.weblogs.com/oracle_mysql_performance And why is the highly advocated transaction capable MySQL 4 not tested? That's the problem, for every performance test they choose ISAM tables, and when transactions are mentioned it's said MySQL has transactions. But why no benchmarks? Regards, Mario Weilguni ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] log_duration
Tom Lane [EMAIL PROTECTED] writes: Christopher Kings-Lynne [EMAIL PROTECTED] writes: Looking at the log_duration postgresql.conf option. How about adding an option log_duration_min which is a value in milliseconds that is the minimum time a query must run for before being logged. Fine with me --- but you'll need to add more logic than that. Right now, log_duration *only* causes the query duration to be printed out; if you ain't got log_statement on, you're in the dark as to what the query itself was. You'll need to add some code to print the query (the log_min_error_statement logic might be a useful source of inspiration). Not sure how this should interact with the case where log_duration is set and the min-duration isn't. But maybe that case is silly, and we should just redefine log_duration as a minimum runtime that causes the query *and* its runtime to be printed to the log. Is it even guaranteed to be properly ordered on a busy server with multiple processors anyways? One option is to have log_query output an identifier with the query such as a hash of the query or the pointer value for the plan, suppressing duplicates. Then log_duration prints the identifier with the duration. This means on a busy server running lots of prepared queries you would see a whole bunch of queries on startup, then hopefully no durations. Any durations printed could cause alarms to go off. To find the query you grep the logs for the identifier in the duration message. This only really works if you're using prepared queries everywhere. But in the long run that will be the case for OLTP systems, which is where log_duration is really useful. -- greg ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] PostgreSQL Benchmarks
On Tue, 2003-02-11 at 08:26, Christopher Kings-Lynne wrote: Hrm. I just saw that the PHP ADODB guy just published a bunch of database benchmarks. It's fairly evident to me that benchmarking PostgreSQL on Win32 isn't really fair: http://php.weblogs.com/oracle_mysql_performance *sigh* How much of the performance difference is from the RDBMS, from the middleware, and from the quality of implementation in the middleware. While I'm not surprised that the the cygwin version of PostgreSQL is slow, those results don't tell me anything about the quality of the middleware interface between PHP and PostgreSQL. Does anyone know if we can rule out some of the performance loss by pinning it to bad middleware implementation for PostgreSQL? Regards, -- Greg Copeland [EMAIL PROTECTED] Copeland Computer Consulting ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] PostgreSQL Benchmarks
On Tue, 2003-02-11 at 08:31, Mario Weilguni wrote: Hrm. I just saw that the PHP ADODB guy just published a bunch of database benchmarks. It's fairly evident to me that benchmarking PostgreSQL on Win32 isn't really fair: http://php.weblogs.com/oracle_mysql_performance And why is the highly advocated transaction capable MySQL 4 not tested? That's the problem, for every performance test they choose ISAM tables, and when transactions are mentioned it's said MySQL has transactions. But why no benchmarks? Insert Statement Not using bind variables (MySQL and Oracle): $DB-BeginTrans(); Using bind variables: $DB-BeginTrans(); PL/SQL Insert Benchmark Appears to not initiate a transaction. I'm assuming this is because it's implicitly within a transaction? Oddly enough, I am seeing explicit commits here. It appears that the benchmarks are attempting to use transactions, however, I have no idea if MySQL's HEAP supports them. For all I know, transactions are being silently ignored. Regards, -- Greg Copeland [EMAIL PROTECTED] Copeland Computer Consulting ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
[HACKERS] Hash grouping, aggregates
So one of the items on the TODO list is Add hash for evaluating GROUP BY aggregates (Tom) I'm finding this would benefit a lot of my queries. Most of the time seems to be going into sorts for group by clauses. I don't know how long it would take to build a hash of course, but I suspect it would be less than the sort. Is this something a beginner could figure out? I'm thinking I need a normal Hash node that builds exactly the same kind of hash as a join, then a HashScan node that picks all the rows out of the hash. The neat thing is that hash aggregates would allow grouping on data types that have = operators but no useful operator. (Incidentally, I'm fond of nested loop, I remember when I was a beginner SQL programmer looking at plans and it was intuitively obvious what it meant. I suspect for a beginner looking at nestloop it might not be quite so obvious.) -- greg ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Hash grouping, aggregates
On Tue, Feb 11, 2003 at 09:48:11 -0500, Greg Stark [EMAIL PROTECTED] wrote: So one of the items on the TODO list is Add hash for evaluating GROUP BY aggregates (Tom) I'm finding this would benefit a lot of my queries. Most of the time seems to be going into sorts for group by clauses. I don't know how long it would take to build a hash of course, but I suspect it would be less than the sort. Is this something a beginner could figure out? I'm thinking I need a normal Hash node that builds exactly the same kind of hash as a join, then a HashScan node that picks all the rows out of the hash. This is already in 7.4. You could try it out by building from CVS. From the HISTORY file: System can use either hash- or sort-based strategy for grouped aggregation ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Hash grouping, aggregates
Bruno Wolff III [EMAIL PROTECTED] writes: This is already in 7.4. You could try it out by building from CVS. From the HISTORY file: System can use either hash- or sort-based strategy for grouped aggregation Ooh, doing that now. Thanks. -- greg ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Hash grouping, aggregates
Greg Stark [EMAIL PROTECTED] writes: So one of the items on the TODO list is Add hash for evaluating GROUP BY aggregates (Tom) It's done in CVS tip ... give it a try. The neat thing is that hash aggregates would allow grouping on data types that have = operators but no useful operator. Hm. Right now I think that would barf on you, because the parser wants to find the '' operator to label the grouping column with, even if the planner later decides not to use it. It'd take some redesign of the query data structure (specifically SortClause/GroupClause) to avoid that. regards, tom lane ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [pgsql-advocacy] [HACKERS] PostgreSQL Benchmarks
I've tested all the win32 versions of postgres I can get my hands on (cygwin and not), and my general feeling is that they have problems with insert performance with fsync() turned on, probably the fault of the os. Select performance is not so much affected. This is easily solved with transactions and other such things. Also Postgres benefits from pl just like oracle. May I make a suggestion that maybe it is time to start thinking about tuning the default config file, IMHO its just a little bit too conservative, and its hurting you in benchmarks being run by idiots, but its still bad publicity. Any real database admin would know his test are synthetic and not meaningful without having to look at the #s. This is irritating me so much that I am going to put together a benchmark of my own, a real world one, on (publicly available) real world data. Mysql is a real dog in a lot of situations. The FCC publishes a database of wireless transmitters that has tables with 10 million records in it. I'll pump that into pg, run some benchmarks, real world queries, and we'll see who the faster database *really* is. This is just a publicity issue, that's all. Its still annoying though. I'll even run an open challenge to database admin to beat query performance of postgres in such datasets, complex multi table joins, etc. I'll even throw out the whole table locking issue and analyze single user performance. Merlin _ How much of the performance difference is from the RDBMS, from the middleware, and from the quality of implementation in the middleware. While I'm not surprised that the the cygwin version of PostgreSQL is slow, those results don't tell me anything about the quality of the middleware interface between PHP and PostgreSQL. Does anyone know if we can rule out some of the performance loss by pinning it to bad middleware implementation for PostgreSQL? Regards, -- Greg Copeland [EMAIL PROTECTED] Copeland Computer Consulting ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] PGP signing releases
On Wed, 2003-02-05 at 18:53, Curt Sampson wrote: On Thu, 5 Feb 2003, Greg Copeland wrote: Who will actually hold the key? Where will it be physically kept? Good question but can usually be addressed. It can be addressed, but how well? This is another big issue that I don't see any plan for that I'm comfortable with.. The reason I was vague is because it depends on the key route. Obviously, if each person signs, each person must protect their own key. If there is a central project key, it's simply a matter of determining which box is used for signing, etc...while important, it's certainly not difficult to address. It seems to me extremely difficult to address. Unless you are physically monitoring someone, how do you prevent someone from copying the key off of that machine. At which point anybody with the passphrase can use it for anything. This issue doesn't change regardless of the mechanism you pick. Anyone that is signing a key must take reasonable measures to ensure the protection of their key. How many people will know the passphrase? As few as possible. Ideally only two, maybe three core developers. Um...I'm not sure that this is a relevant question at all. The passphrase is not part of the key; it's just used to encrypt the key for storage. If you know the passphrase, you can make unlimited copies of the key, and these copies can be protected with any passphrases you like, or no passphrase, for that matter. If you're concerned about this to that extent, clearly those people should not part of the web of trust nor should they be receiving the passphrase nor a copy of the private key. Remember, trust is a key (pun intended) part of a reliable PKI. In that case, I would trust only one person with the key. Making copies of the key for others gives no additional protection (since it takes only one person out of the group to sign the release) while it increases the chance of key compromise (since there are now more copies of the key kicking around, and more people who know the passphrase). Which brings us back to backups. Should the one person that has the key be unavailable or dead, who will sign the release? Furthermore, making *limited* copies of the key does provide for additional limited protection in case it's lost for some reason. This helps mitigate the use of the revocation key until it's absolutely required. Also provides for backup (of key and people). Basically, you are saying: You trust a core developer You trust they can protect their keys You trust they can properly distribute their trust You don't trust a core developer with a key Hmmm...something smells in your web of trust...So, which is it? Do you trust the core developers to protect the interests of the project and the associated key or not? If not, why trust any digital signature from them in the first place? Can't stress this enough. PKI is an absolute failure without trust. Period. Keys cannot be transfered from one person to another since, being digital data, there's no way to ascertain that the original holder does not still (on purpose or inadvertantly) have copies of the key. So in the case where we want to transfer trust from one person to another, we must also generate a new key and revoke the old one. No one is talking about transferring keys. In fact, I've previously addressed this topic, from a different angle, a number of times. We are talking about shared trust and not transfered trust. The transferring of trust is done by signing keys, not transferring keys. This is now exactly equivalant to having each developer sign postgres with a signing key (signed by his main key) for which the other developers (or appropriate authority) have a revocation certificate. And back to the passphrase issue, once again, can't you see that it's completely irrelevant? At some point, someone who knows the passphrase is going to have to be in a position to use that to decrypt the key. At that point he has the key, period. Changing the passphrase does no good, because you can't change the passphrase on the copy of the key he may have made. So you trust the core developer to sign the package but you don't trust him to have the key that's required to sign it? You can't have it both ways. A passphrase is like a lock on your barn door. After you've given someone the key and he's gone in and taken the cow, changing the lock gives you no protection at all. I can assure you I fully understand the implications and meaning of everything I've said. -- Greg Copeland [EMAIL PROTECTED] Copeland Computer Consulting ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Changing the default configuration (was Re: [pgsql-advocacy] [HACKERS] PostgreSQL Benchmarks)
Merlin Moncure [EMAIL PROTECTED] writes: May I make a suggestion that maybe it is time to start thinking about tuning the default config file, IMHO its just a little bit too conservative, It's a lot too conservative. I've been thinking for awhile that we should adjust the defaults. The original motivation for setting shared_buffers = 64 was so that Postgres would start out-of-the-box on machines where SHMMAX is 1 meg (64 buffers = 1/2 meg, leaving 1/2 meg for our other shared data structures). At one time SHMMAX=1M was a pretty common stock kernel setting. But our other data structures blew past the 1/2 meg mark some time ago; at default settings the shmem request is now close to 1.5 meg. So people with SHMMAX=1M have already got to twiddle their postgresql.conf settings, or preferably learn how to increase SHMMAX. That means there is *no* defensible reason anymore for defaulting to 64 buffers. We could retarget to try to stay under SHMMAX=4M, which I think is the next boundary that's significant in terms of real-world platforms (isn't that the default SHMMAX on some BSDen?). That would allow us 350 or so shared_buffers, which is better, but still not really a serious choice for production work. What I would really like to do is set the default shared_buffers to 1000. That would be 8 meg worth of shared buffer space. Coupled with more-realistic settings for FSM size, we'd probably be talking a shared memory request approaching 16 meg. This is not enough RAM to bother any modern machine from a performance standpoint, but there are probably quite a few platforms out there that would need an increase in their stock SHMMAX kernel setting before they'd take it. So what this comes down to is making it harder for people to get Postgres running for the first time, versus making it more likely that they'll see decent performance when they do get it running. It's worth noting that increasing SHMMAX is not nearly as painful as it was back when these decisions were taken. Most people have moved to platforms where it doesn't even take a kernel rebuild, and we've acquired documentation that tells how to do it on all(?) our supported platforms. So I think it might be okay to expect people to do it. The alternative approach is to leave the settings where they are, and to try to put more emphasis in the documentation on the fact that the factory-default settings produce a toy configuration that you *must* adjust upward for decent performance. But we've not had a lot of success spreading that word, I think. With SHMMMAX too small, you do at least get a pretty specific error message telling you so. Comments? regards, tom lane ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Hash grouping, aggregates
On Tue, Feb 11, 2003 at 10:41:53 -0500, Tom Lane [EMAIL PROTECTED] wrote: Greg Stark [EMAIL PROTECTED] writes: So one of the items on the TODO list is Add hash for evaluating GROUP BY aggregates (Tom) It's done in CVS tip ... give it a try. The neat thing is that hash aggregates would allow grouping on data types that have = operators but no useful operator. Hm. Right now I think that would barf on you, because the parser wants to find the '' operator to label the grouping column with, even if the planner later decides not to use it. It'd take some redesign of the query data structure (specifically SortClause/GroupClause) to avoid that. I think another issue is that for some = operators you still might not be able to use a hash. I would expect the discussion for hash joins in http://developer.postgresql.org/docs/postgres/xoper-optimization.html would to hash aggregates as well. ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Hash grouping, aggregates
Bruno Wolff III [EMAIL PROTECTED] writes: Tom Lane [EMAIL PROTECTED] wrote: Greg Stark [EMAIL PROTECTED] writes: The neat thing is that hash aggregates would allow grouping on data types that have = operators but no useful operator. Hm. Right now I think that would barf on you, because the parser wants to find the '' operator to label the grouping column with, even if the planner later decides not to use it. It'd take some redesign of the query data structure (specifically SortClause/GroupClause) to avoid that. I think another issue is that for some = operators you still might not be able to use a hash. I would expect the discussion for hash joins in http://developer.postgresql.org/docs/postgres/xoper-optimization.html would to hash aggregates as well. Right, the = operator must be hashable or you're out of luck. But we could imagine tweaking the parser to allow GROUP BY if it finds a hashable = operator and no sort operator. The only objection I can see to this is that it means the planner *must* use hash aggregation, which might be a bad move if there are too many distinct groups. regards, tom lane ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: Changing the default configuration (was Re: [pgsql-advocacy] [HACKERS] PostgreSQL Benchmarks)
On Tue, Feb 11, 2003 at 11:20:14AM -0500, Tom Lane wrote: ... We could retarget to try to stay under SHMMAX=4M, which I think is the next boundary that's significant in terms of real-world platforms (isn't that the default SHMMAX on some BSDen?). ... Assuming 1 page = 4k, and number of pages is correct in GENERIC kernel configs, SHMMAX=4M for NetBSD (8M for i386, x86_64) Cheers, Patrick ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Changing the default configuration (was Re: [pgsql-advocacy]
On Tue, 2003-02-11 at 10:20, Tom Lane wrote: Merlin Moncure [EMAIL PROTECTED] writes: May I make a suggestion that maybe it is time to start thinking about tuning the default config file, IMHO its just a little bit too conservative, It's a lot too conservative. I've been thinking for awhile that we should adjust the defaults. The original motivation for setting shared_buffers = 64 was so that Postgres would start out-of-the-box on machines where SHMMAX is 1 meg (64 buffers = 1/2 meg, leaving 1/2 meg for our other shared data structures). At one time SHMMAX=1M was a pretty common stock kernel setting. But our other data structures blew past the 1/2 meg mark some time ago; at default settings the shmem request is now close to 1.5 meg. So people with SHMMAX=1M have already got to twiddle their postgresql.conf settings, or preferably learn how to increase SHMMAX. That means there is *no* defensible reason anymore for defaulting to 64 buffers. We could retarget to try to stay under SHMMAX=4M, which I think is the next boundary that's significant in terms of real-world platforms (isn't that the default SHMMAX on some BSDen?). That would allow us 350 or so shared_buffers, which is better, but still not really a serious choice for production work. What I would really like to do is set the default shared_buffers to 1000. That would be 8 meg worth of shared buffer space. Coupled with more-realistic settings for FSM size, we'd probably be talking a shared memory request approaching 16 meg. This is not enough RAM to bother any modern machine from a performance standpoint, but there are probably quite a few platforms out there that would need an increase in their stock SHMMAX kernel setting before they'd take it. So what this comes down to is making it harder for people to get Postgres running for the first time, versus making it more likely that they'll see decent performance when they do get it running. It's worth noting that increasing SHMMAX is not nearly as painful as it was back when these decisions were taken. Most people have moved to platforms where it doesn't even take a kernel rebuild, and we've acquired documentation that tells how to do it on all(?) our supported platforms. So I think it might be okay to expect people to do it. The alternative approach is to leave the settings where they are, and to try to put more emphasis in the documentation on the fact that the factory-default settings produce a toy configuration that you *must* adjust upward for decent performance. But we've not had a lot of success spreading that word, I think. With SHMMMAX too small, you do at least get a pretty specific error message telling you so. Comments? I'd personally rather have people stumble trying to get PostgreSQL running, up front, rather than allowing the lowest common denominator more easily run PostgreSQL only to be disappointed with it and move on. After it's all said and done, I would rather someone simply say, it's beyond my skill set, and attempt to get help or walk away. That seems better than them being able to run it and say, it's a dog, spreading word-of-mouth as such after they left PostgreSQL behind. Worse yet, those that do walk away and claim it performs horribly are probably doing more harm to the PostgreSQL community than expecting someone to be able to install software ever can. Nutshell: Easy to install but is horribly slow. or Took a couple of minutes to configure and it rocks! Seems fairly cut-n-dry to me. ;) Regards, -- Greg Copeland [EMAIL PROTECTED] Copeland Computer Consulting ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [HACKERS] Changing the default configuration (was Re: [pgsql-advocacy]
Tom Lane wrote: "Merlin Moncure" [EMAIL PROTECTED] writes: May I make a suggestion that maybe it is time to start thinking about tuning the default config file, IMHO its just a little bit too conservative, It's a lot too conservative. I've been thinking for awhile that we should adjust the defaults. One of the things I did on my Windows install was to have a number of default configuration files, postgresql.conf.small, postgresql.conf.medium, postgresql.conf.large. Rather than choose one, in the "initdb" script, ask for or determine the mount of shared memory, memory, etc. Another pet peeve I have is forcing the configuration files to be in the database directory. We had this argument in 7.1 days, and I submitted a patch that allowed a configuration file to be specified as a command line parameter. One of the things that Oracle does better is separating the "configuration" from the data. It is an easy patch to allow PostgreSQL to use a separate configuration directory, and specify the data directory within the configuration file (The way any logical application works), and, NO, symlinks are not a solution, they are a kludge.
Re: [HACKERS] Changing the default configuration (was Re: [pgsql-advocacy]
Tom Lane wrote: snip What I would really like to do is set the default shared_buffers to 1000. That would be 8 meg worth of shared buffer space. Coupled with more-realistic settings for FSM size, we'd probably be talking a shared memory request approaching 16 meg. This is not enough RAM to bother any modern machine from a performance standpoint, but there are probably quite a few platforms out there that would need an increase in their stock SHMMAX kernel setting before they'd take it. snip Totally agree with this. We really, really, really, really need to get the default to a point where we have _decent_ default performance. The alternative approach is to leave the settings where they are, and to try to put more emphasis in the documentation on the fact that the factory-default settings produce a toy configuration that you *must* adjust upward for decent performance. But we've not had a lot of success spreading that word, I think. With SHMMMAX too small, you do at least get a pretty specific error message telling you so. Comments? Yep. Here's an *unfortunately very common* scenario, that again unfortunately, a _seemingly large_ amount of people fall for. a) Someone decides to benchmark database XYZ vs PostgreSQL vs other databases b) Said benchmarking person knows very little about PostgreSQL, so they install the RPM's, packages, or whatever, and it works. Then they run whatever benchmark they've downloaded, or designed, or whatever c) PostgreSQL, being practically unconfigured, runs at the pace of a slow, mostly-disabled snail. d) Said benchmarking person gets better performance from the other databases (also set to their default settings) and thinks PostgreSQL has lots of features, and it's free, but it's Too Slow. Yes, this kind of testing shouldn't even _pretend_ to have any real world credibility. e) Said benchmarking person tells everyone they know, _and_ everyone they meet about their results. Some of them even create nice looking or profesional looking web pages about it. f) People who know even _less_ than the benchmarking person hear about the test, or read the result, and don't know any better than to believe it at face value. So, they install whatever system was recommended. g) Over time, the benchmarking person gets the hang of their chosen database more and writes further articles about it, and doesn't generally look any further afield than it for say... a couple of years. By this time, they've already influenced a couple of thousand people in the non-optimal direction. h) Arrgh. With better defaults, our next release would _appear_ to be a lot faster to quite a few people, just because they have no idea about tuning. So, as sad as this scenario is, better defaults will probably encourage a lot more newbies to get involved, and that'll eventually translate into a lot more experienced users, and a few more coders to assist. ;-) Personally I'd be a bunch happier if we set the buffers so high that we definitely have decent performance, and the people that want to run PostgreSQL are forced to make the choice of either: 1) Adjust their system settings to allow PostgreSQL to run properly, or 2) Manually adjust the PostgreSQL settings to run memory-constrained This way, PostgreSQL either runs decently, or they are _aware_ that they're limiting it. That should cut down on the false benchmarks (hopefully). :-) Regards and best wishes, Justin Clift regards, tom lane -- My grandfather once told me that there are two kinds of people: those who work and those who take the credit. He told me to try to be in the first group; there was less competition there. - Indira Gandhi ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: Changing the default configuration (was Re: [pgsql-advocacy] [HACKERS] PostgreSQL Benchmarks)
A quick-'n'-dirty first step would be more comments in postgresql.conf. Most of the lines are commented out which would imply use the default but the default is not shown. (I realize this has the difficulty of defaults that change depending upon how PostgreSQL was configured/compiled but perhaps postgresql.conf could be built by the make process based on the configuration options.) If postgresql.conf were commented with recommendations it would probably be all I need though perhaps a recommendation to edit that file should be displayed at the conclusion of make install. Cheers, Steve On Tuesday 11 February 2003 8:20 am, Tom Lane wrote: Merlin Moncure [EMAIL PROTECTED] writes: May I make a suggestion that maybe it is time to start thinking about tuning the default config file, IMHO its just a little bit too conservative, It's a lot too conservative. I've been thinking for awhile that we should adjust the defaults. The original motivation for setting shared_buffers = 64 was so that Postgres would start out-of-the-box on machines where SHMMAX is 1 meg (64 buffers = 1/2 meg, leaving 1/2 meg for our other shared data structures). At one time SHMMAX=1M was a pretty common stock kernel setting. But our other data structures blew past the 1/2 meg mark some time ago; at default settings the shmem request is now close to 1.5 meg. So people with SHMMAX=1M have already got to twiddle their postgresql.conf settings, or preferably learn how to increase SHMMAX. That means there is *no* defensible reason anymore for defaulting to 64 buffers. We could retarget to try to stay under SHMMAX=4M, which I think is the next boundary that's significant in terms of real-world platforms (isn't that the default SHMMAX on some BSDen?). That would allow us 350 or so shared_buffers, which is better, but still not really a serious choice for production work. What I would really like to do is set the default shared_buffers to 1000. That would be 8 meg worth of shared buffer space. Coupled with more-realistic settings for FSM size, we'd probably be talking a shared memory request approaching 16 meg. This is not enough RAM to bother any modern machine from a performance standpoint, but there are probably quite a few platforms out there that would need an increase in their stock SHMMAX kernel setting before they'd take it. So what this comes down to is making it harder for people to get Postgres running for the first time, versus making it more likely that they'll see decent performance when they do get it running. It's worth noting that increasing SHMMAX is not nearly as painful as it was back when these decisions were taken. Most people have moved to platforms where it doesn't even take a kernel rebuild, and we've acquired documentation that tells how to do it on all(?) our supported platforms. So I think it might be okay to expect people to do it. The alternative approach is to leave the settings where they are, and to try to put more emphasis in the documentation on the fact that the factory-default settings produce a toy configuration that you *must* adjust upward for decent performance. But we've not had a lot of success spreading that word, I think. With SHMMMAX too small, you do at least get a pretty specific error message telling you so. Comments? regards, tom lane ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED] ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: Changing the default configuration (was Re: [pgsql-advocacy] [HACKERS] PostgreSQL Benchmarks)
Justin Clift [EMAIL PROTECTED] writes: Personally I'd be a bunch happier if we set the buffers so high that we definitely have decent performance, and the people that want to run PostgreSQL are forced to make the choice of either: 1) Adjust their system settings to allow PostgreSQL to run properly, or 2) Manually adjust the PostgreSQL settings to run memory-constrained This way, PostgreSQL either runs decently, or they are _aware_ that they're limiting it. Yeah, that is the subtext here. If you can't increase SHMMAX then you can always trim the postgresql.conf parameters --- but theoretically, at least, you should then have a clue that you're running a badly-configured setup ... regards, tom lane ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] Changing the default configuration (was Re: [pgsql-advocacy]
Greg Copeland wrote: I'd personally rather have people stumble trying to get PostgreSQL running, up front, rather than allowing the lowest common denominator more easily run PostgreSQL only to be disappointed with it and move on. After it's all said and done, I would rather someone simply say, it's beyond my skill set, and attempt to get help or walk away. That seems better than them being able to run it and say, it's a dog, spreading word-of-mouth as such after they left PostgreSQL behind. Worse yet, those that do walk away and claim it performs horribly are probably doing more harm to the PostgreSQL community than expecting someone to be able to install software ever can. RANT And that my friends is why PostgreSQL is still relatively obscure. This attitude sucks. If you want a product to be used, you must put the effort into making it usable. It is a no-brainer to make the default configuration file suitable for the majority of users. It is lunacy to create a default configuration which provides poor performance for over 90% of the users, but which allows the lowest common denominator to work. A product must not perform poorly out of the box, period. A good product manager would choose one of two possible configurations, (a) a high speed fairly optimized system from the get-go, or (b) it does not run unless you create the configuration file. Option (c) out of the box it works like crap, is not an option. This is why open source gets such a bad reputation. Outright contempt for the user who may not know the product as well as those developing it. This attitude really sucks and it turns people off. We want people to use PostgreSQL, to do that we must make PostgreSQL usable. Usability IS important. /RANT ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [HACKERS] Changing the default configuration (was Re: [pgsql-advocacy]
On Tue, 2003-02-11 at 12:10, Steve Crawford wrote: A quick-'n'-dirty first step would be more comments in postgresql.conf. Most This will not solve the issue with the large number of users who have no interest in looking at the config file -- but are interested in publishing their results. -- Rod Taylor [EMAIL PROTECTED] PGP Key: http://www.rbt.ca/rbtpub.asc signature.asc Description: This is a digitally signed message part
Re: [HACKERS] Changing the default configuration (was Re: [pgsql-advocacy]
What if we supplied several sample .conf files, and let the user choose which to copy into the database directory? We could have a high read Exactly my first thought when reading the proposal for a setting suited for performance tests. performance profile, and a transaction database profile, and a workstation profile, and a low impact profile. We could even supply a And a .benchmark profile :-) Perl script that would adjust SHMMAX and SHMMALL on platforms where this can be done from the command line. Or maybe configuration could be adjusted with ./configure if SHMMAX can be determined at that point? -- Kaare Rasmussen--Linux, spil,--Tlf:3816 2582 Kaki Datatshirts, merchandize Fax:3816 2501 Howitzvej 75 Åben 12.00-18.00Email: [EMAIL PROTECTED] 2000 FrederiksbergLørdag 12.00-16.00 Web: www.suse.dk ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Changing the default configuration (was Re:
On Tue, 2003-02-11 at 11:23, mlw wrote: Greg Copeland wrote: I'd personally rather have people stumble trying to get PostgreSQL running, up front, rather than allowing the lowest common denominator more easily run PostgreSQL only to be disappointed with it and move on. After it's all said and done, I would rather someone simply say, it's beyond my skill set, and attempt to get help or walk away. That seems better than them being able to run it and say, it's a dog, spreading word-of-mouth as such after they left PostgreSQL behind. Worse yet, those that do walk away and claim it performs horribly are probably doing more harm to the PostgreSQL community than expecting someone to be able to install software ever can. RANT And that my friends is why PostgreSQL is still relatively obscure. This attitude sucks. If you want a product to be used, you must put the effort into making it usable. Ah..okay It is a no-brainer to make the default configuration file suitable for the majority of users. It is lunacy to create a default configuration which provides poor performance for over 90% of the users, but which allows the lowest common denominator to work. I think you read something into my email which I did not imply. I'm certainly not advocating a default configuration file assuming 512M of share memory or some such insane value. Basically, you're arguing that they should keep doing exactly what they are doing. It's currently known to be causing problems and propagating the misconception that PostgreSQL is unable to perform under any circumstance. I'm arguing that who cares if 5% of the potential user base has to learn to properly install software. Either they'll read and learn, ask for assistance, or walk away. All of which are better than Jonny-come-lately offering up a meaningless benchmark which others are happy to eat with rather large spoons. A product must not perform poorly out of the box, period. A good product manager would choose one of two possible configurations, (a) a high speed fairly optimized system from the get-go, or (b) it does not run unless you create the configuration file. Option (c) out of the box it works like crap, is not an option. That's the problem. Option (c) is what we currently have. I'm amazed that you even have a problem with option (a), as that's what I'm suggesting. The problem is, potentially for some minority of users, it may not run out of the box. As such, I'm more than happy with this situation than 90% of the user base being stuck with a crappy default configuration. Oddly enough, your option (b) is even worse than what you are ranting at me about. Go figure. This is why open source gets such a bad reputation. Outright contempt for the user who may not know the product as well as those developing it. This attitude really sucks and it turns people off. We want people to use PostgreSQL, to do that we must make PostgreSQL usable. Usability IS important. /RANT There is no contempt here. Clearly you've read your own bias into this thread. If you go back and re-read my posting, I think it's VERY clear that it's entirely about usability. Regards, -- Greg Copeland [EMAIL PROTECTED] Copeland Computer Consulting ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: Changing the default configuration (was Re: [pgsql-advocacy] [HACKERS] PostgreSQL Benchmarks)
FYI, my stock linux 2.4.19 gentoo kernel has: kernel.shmall = 2097152 kernel.shmmax = 33554432 sysctl -a So it appears that linux at least is way above your 8 meg point, unless I am missing something. ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [HACKERS] Changing the default configuration (was Re: [pgsql-advocacy]
mlw [EMAIL PROTECTED] writes: This attitude sucks. If you want a product to be used, you must put the effort into making it usable. [snip] AFAICT, you are flaming Greg for recommending the exact same thing you are recommending. Please calm down and read again. regards, tom lane ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: Changing the default configuration (was Re: [pgsql-advocacy] [HACKERS] PostgreSQL Benchmarks)
Jon Griffin [EMAIL PROTECTED] writes: So it appears that linux at least is way above your 8 meg point, unless I am missing something. Yeah, AFAIK all recent Linuxen are well above the range of parameters that I was suggesting (and even if they weren't, Linux is particularly easy to change the SHMMAX setting on). It's other Unixoid platforms that are likely to have a problem. Particularly the ones where you have to rebuild the kernel to change SHMMAX; people may be afraid to do that. Does anyone know whether cygwin has a setting comparable to SHMMAX, and if so what is its default value? How about the upcoming native Windows port --- any issues there? regards, tom lane ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Changing the default configuration (was Re: [pgsql-advocacy]
On Tue, 2003-02-11 at 12:08, Justin Clift wrote: b) Said benchmarking person knows very little about PostgreSQL, so they install the RPM's, packages, or whatever, and it works. Then they run whatever benchmark they've downloaded, or designed, or whatever Out of curiosity, how feasible is it for the rpm/package/deb/exe maintainers to modify their supplied postgresql.conf settings when building said distribution? AFAIK the minimum default SHHMAX setting on Red Hat 8.0 is 32MB, seems like bumping shared buffers to work with that amount would be acceptable inside the 8.0 rpm's. Robert Treat ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Changing the default configuration (was Re: [pgsql-advocacy]
Apology After Mark calms down and, in fact, sees that Greg was saying the right thing after all, chagrin is the only word. I'm sorry. Greg Copeland wrote: On Tue, 2003-02-11 at 11:23, mlw wrote: Greg Copeland wrote: I'd personally rather have people stumble trying to get PostgreSQL running, up front, rather than allowing the lowest common denominator more easily run PostgreSQL only to be disappointed with it and move on. After it's all said and done, I would rather someone simply say, "it's beyond my skill set", and attempt to get help or walk away. That seems better than them being able to run it and say, "it's a dog", spreading word-of-mouth as such after they left PostgreSQL behind. Worse yet, those that do walk away and claim it performs horribly are probably doing more harm to the PostgreSQL community than expecting someone to be able to install software ever can. RANT And that my friends is why PostgreSQL is still relatively obscure. This attitude sucks. If you want a product to be used, you must put the effort into making it usable. Ah..okay It is a no-brainer to make the default configuration file suitable for the majority of users. It is lunacy to create a default configuration which provides poor performance for over 90% of the users, but which allows the lowest common denominator to work. I think you read something into my email which I did not imply. I'm certainly not advocating a default configuration file assuming 512M of share memory or some such insane value. Basically, you're arguing that they should keep doing exactly what they are doing. It's currently known to be causing problems and propagating the misconception that PostgreSQL is unable to perform under any circumstance. I'm arguing that who cares if 5% of the potential user base has to learn to properly install software. Either they'll read and learn, ask for assistance, or walk away. All of which are better than Jonny-come-lately offering up a meaningless benchmark which others are happy to eat with rather large spoons. A product must not perform poorly out of the box, period. A good product manager would choose one of two possible configurations, (a) a high speed fairly optimized system from the get-go, or (b) it does not run unless you create the configuration file. Option (c) out of the box it works like crap, is not an option. That's the problem. Option (c) is what we currently have. I'm amazed that you even have a problem with option (a), as that's what I'm suggesting. The problem is, potentially for some minority of users, it may not run out of the box. As such, I'm more than happy with this situation than 90% of the user base being stuck with a crappy default configuration. Oddly enough, your option (b) is even worse than what you are ranting at me about. Go figure. This is why open source gets such a bad reputation. Outright contempt for the user who may not know the product as well as those developing it. This attitude really sucks and it turns people off. We want people to use PostgreSQL, to do that we must make PostgreSQL usable. Usability IS important. /RANT There is no contempt here. Clearly you've read your own bias into this thread. If you go back and re-read my posting, I think it's VERY clear that it's entirely about usability. Regards,
[HACKERS] location of the configuration files
The debate on the configuration file sparked a memory of an old patch I submitted in 7.1 days. One of the things I do not like about PostgreSQL is, IMHO, is a backwards configuration process. Rather than specify a data directory, the administrator should specify a database configuration file. Within the configuration file is the location and names of the data directory and other information. Most admins want a central location for this information. One of the things that is frustrating to me, is to have to hunt down where the data directory is so that I can administrate a DB. It can be anywhere, in any directory on any volume. If you had, say, a /usr/local/pgsql/admin, then you could have mydb.conf which could then be checked in to CVS. A standard location for configuration files is a more normal process as the location of the data directory is less so. I just don't think the PG data directory should not contain configuration information. The original patch allowed a user to specify the location of the postgresql.conf file, rather than assuming it lived in $PGDATA Also included in that patch, was the ability to specify the location of the PGDATA directory as well as the names of the pg_hba.conf and other configuration files. It also allowed the user to use PG as it has always worked, The patch was not applied because a better solution was supposedly coming, but that was two major revisions ago. I would still like to see this functionality. Would anyone else? ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Changing the default configuration (was Re: [pgsql-advocacy]
My other pet peeve is the default max connections setting. This should be higher if possible, but of course, there's always the possibility of running out of file descriptors. Apache has a default max children of 150, and if using PHP or another language that runs as an apache module, it is quite possible to use up all the pgsql backend slots before using up all the apache child slots. Is setting the max connections to something like 200 reasonable, or likely to cause too many problems? ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
[HACKERS] Windows SHMMAX (was: Default configuration)
Does anyone know whether cygwin has a setting comparable to SHMMAX, and if so what is its default value? How about the upcoming native Windows port --- any issues there? From a pure win32 point of view, a good approach would be to use the VirtualAlloc() memory allocation functions and set up a paged memory allocation system. From a very top down point of view, this is the method of choice if portability is not an issue. An abstraction to use this technique within pg context is probably complex and requires writing lots of win32 api code, which is obviously not desirable. Another way of looking at it is memory mapped files. This probably most closely resembles unix shared memory and is the de facto standard way for interprocess memory block sharing. Sadly, performance will suffer because you have to rely on the virtual memory system (think: writing to files) to do a lot of stupid stuff you don't necessarily want or need. The OS has to guarantee that the memory can be swapped out to file at any time and therefore mirrors the pagefile to the allocated memory blocks. With the C++/C memory malloc/free api, you are supposed to be able to get some of the benefits of virtual alloc (in particular, setting a process memory allocation limit), but personal experience did not bear this out. However, this api sits directly over the virtual allocation system and is the most portable. The application has to guard against fragmentation and things like that in this case. In win32, server thrashing is public enemy #1 for database servers, mostly due to the virtual allocation system (which is quite fast when used right, btw). Merlin ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] location of the configuration files
On Tue, 2003-02-11 at 13:44, mlw wrote: The debate on the configuration file sparked a memory of an old patch I submitted in 7.1 days. One of the things I do not like about PostgreSQL is, IMHO, is a backwards configuration process. Rather than specify a data directory, the administrator should specify a database configuration file. Within the configuration file is the location and names of the data directory and other information. Most admins want a central location for this information. One of the things that is frustrating to me, is to have to hunt down where the data directory is so that I can administrate a DB. It can be anywhere, in any directory on any volume. If you had, say, a /usr/local/pgsql/admin, then you could have mydb.conf which could then be checked in to CVS. A standard location for configuration files is a more normal process as the location of the data directory is less so. I just don't think the PG data directory should not contain configuration information. The original patch allowed a user to specify the location of the postgresql.conf file, rather than assuming it lived in $PGDATA Also included in that patch, was the ability to specify the location of the PGDATA directory as well as the names of the pg_hba.conf and other configuration files. It also allowed the user to use PG as it has always worked, The patch was not applied because a better solution was supposedly coming, but that was two major revisions ago. I would still like to see this functionality. Would anyone else? I'm going to be lazy and ask if you can post what the better solution that was coming was (or a link to the thread). While I'll grant you that the it's coming argument is pretty weak after two releases, that fact that it may have been a better solution could still hold up. Robert Treat ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Changing the default configuration (was Re: [pgsql-advocacy]
scott.marlowe [EMAIL PROTECTED] writes: Is setting the max connections to something like 200 reasonable, or likely to cause too many problems? That would likely run into number-of-semaphores limitations (SEMMNI, SEMMNS). We do not seem to have as good documentation about changing that as we do about changing the SHMMAX setting, so I'm not sure I want to buy into the it's okay to expect people to fix this before they can start Postgres the first time argument here. Also, max-connections doesn't silently skew your testing: if you need to raise it, you *will* know it. regards, tom lane ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] location of the configuration files
Robert Treat wrote: I'm going to be lazy and ask if you can post what the better solution that was coming was (or a link to the thread). While I'll grant you that the it's coming argument is pretty weak after two releases, that fact that it may have been a better solution could still hold up. Robert Treat AFAIK it wasn't actually done. It was more of a, we should do something different argument. At one point it was talked about rewriting the configuration system to allow include and other things. ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [HACKERS] Changing the default configuration (was Re:
On Tue, 2003-02-11 at 12:55, Tom Lane wrote: scott.marlowe [EMAIL PROTECTED] writes: Is setting the max connections to something like 200 reasonable, or likely to cause too many problems? That would likely run into number-of-semaphores limitations (SEMMNI, SEMMNS). We do not seem to have as good documentation about changing that as we do about changing the SHMMAX setting, so I'm not sure I want to buy into the it's okay to expect people to fix this before they can start Postgres the first time argument here. Also, max-connections doesn't silently skew your testing: if you need to raise it, you *will* know it. Besides, I'm not sure that it makes sense to let other product needs dictate the default configurations for this one. It would be one thing if the vast majority of people only used PostgreSQL with Apache. I know I'm using it in environments in which no way relate to the web. I'm thinking I'm not alone. Regards, -- Greg Copeland [EMAIL PROTECTED] Copeland Computer Consulting ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Windows SHMMAX (was: Default configuration)
On Tue, 2003-02-11 at 12:49, Merlin Moncure wrote: Does anyone know whether cygwin has a setting comparable to SHMMAX, and if so what is its default value? How about the upcoming native Windows port --- any issues there? From a pure win32 point of view, a good approach would be to use the VirtualAlloc() memory allocation functions and set up a paged memory allocation system. From a very top down point of view, this is the method of choice if portability is not an issue. An abstraction to use this technique within pg context is probably complex and requires writing lots of win32 api code, which is obviously not desirable. Another way of looking at it is memory mapped files. This probably most closely resembles unix shared memory and is the de facto standard way for interprocess memory block sharing. Sadly, performance will suffer because you have to rely on the virtual memory system (think: writing to files) to do a lot of stupid stuff you don't necessarily want or need. The OS has to guarantee that the memory can be swapped out to file at any time and therefore mirrors the pagefile to the allocated memory blocks. With the C++/C memory malloc/free api, you are supposed to be able to get some of the benefits of virtual alloc (in particular, setting a process memory allocation limit), but personal experience did not bear this out. However, this api sits directly over the virtual allocation system and is the most portable. The application has to guard against fragmentation and things like that in this case. In win32, server thrashing is public enemy #1 for database servers, mostly due to the virtual allocation system (which is quite fast when used right, btw). IIRC, there is a mechanism which enables it to be directly supported/mapped via pagefile. This is the preferred means of memory mapped files unless you have a specific need which dictates otherwise. Meaning, it allows for many supposed optimizations to be used by the OS as it is suppose to bypass some of the filesystem overhead. Regards, -- Greg Copeland [EMAIL PROTECTED] Copeland Computer Consulting ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Changing the default configuration (was Re: [pgsql-advocacy]
Tom Lane wrote: I think that what this discussion is really leading up to is that we are going to decide to apply the same principle to performance. The out-of-the-box settings ought to give reasonable performance, and if your system can't handle it, you should have to take explicit action to acknowledge the fact that you aren't going to get reasonable performance. What I don't understand is why this is such a huge issue. Set it to a reasonable level (be it 4M or whatever the concensus is) let the packagers worry about it if that's not appropriate. Isn't it their job to have a good out-of-the-package experience? Won't they have better knowledge of what the system limits are for the packages they develop for? Worst case, couldn't they have a standard conf package a special high-performance conf package in addition to all the base packages? After all, it's the users of the RPMs that are the real problem, not usually the people that compile it on their own. If you were having problems with the compile-it-yourself audience, couldn't you just hit them over the head three or four times (configure, install, initdb failed startup to name a few) reminding them to change it if it wasn't appropriate. What more can you really do? At some point, the end user has to bear some responsibility... -- Jeff Hoffmann PropertyKey.com ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Changing the default configuration (was Re:
On 11 Feb 2003, Greg Copeland wrote: On Tue, 2003-02-11 at 12:55, Tom Lane wrote: scott.marlowe [EMAIL PROTECTED] writes: Is setting the max connections to something like 200 reasonable, or likely to cause too many problems? That would likely run into number-of-semaphores limitations (SEMMNI, SEMMNS). We do not seem to have as good documentation about changing that as we do about changing the SHMMAX setting, so I'm not sure I want to buy into the it's okay to expect people to fix this before they can start Postgres the first time argument here. Also, max-connections doesn't silently skew your testing: if you need to raise it, you *will* know it. Besides, I'm not sure that it makes sense to let other product needs dictate the default configurations for this one. It would be one thing if the vast majority of people only used PostgreSQL with Apache. I know I'm using it in environments in which no way relate to the web. I'm thinking I'm not alone. True, but even so, 32 max connections is a bit light. I have more pgsql databases than that on my box now. My point in my previous answer to Tom was that you HAVE to shut down postgresql to change this. It doesn't allocate tons of semaphores on startup, just when the child processes are spawned, and I'd rather have the user adjust their OS to meet the higher need than have to shut down and restart postgresql as well. This is one of the settings that make it feel like a toy when you first open it. How many other high quality databases in the whole world restrict max connections to 32? The original choice of 32 was set because the original choice of 64 shared memory blocks as the most we could hope for on common OS installs. Now that we're looking at cranking that up to 1000, shouldn't max connections get a look too? You don't have to be using apache to need more than 32 simo connections. Heck, how many postgresql databases do you figure are in production with that setting still in there? My guess is not many. I'm not saying we should do this to make benchmarks better either, I'm saying we should do it to improve the user experience. A limit of 32 connects makes things tough for a beginning DBA, not only does he find out the problem while his database is under load the first time, but then he can't fix it without shutting down and restarting postgresql. If the max is set to 200 or 500 and he starts running out of semaphores, that's a problem he can address while his database is still up and running in most operating systems, at least in the ones I use. So, my main point is that any setting that requires you to shut down postgresql to make the change, we should pick a compromise value that means you never likely will have to shut down the database once you've started it up and it's under load. shared buffers, max connects, etc... should not need tweaking for 95% or more of the users if we can help it. It would be nice if we could find a set of numbers that reduce the number of problems users have, so all I'm doing is looking for the sweetspot, which is NOT 32 max connections. ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
FW: [HACKERS] Changing the default configuration (was Re:
True, but even so, 32 max connections is a bit light. I have more pgsql databases than that on my box now. My point in my previous answer to Tom was that you HAVE to shut down postgresql to change this. It doesn't allocate tons of semaphores on startup, [snip] is this correct? I recall looking through the source and seeing comments to the affect that it is better to allocate them all (semaphores) up front in order to prevent runtime failed allocations. (could be totally off base on this). You don't have to be using apache to need more than 32 simo connections. Heck, how many postgresql databases do you figure are in production with that setting still in there? My guess is not many. [snip] True, and it is not unheard of to put minimum specs for version x of the database, i.e. 7.4 requires kernel 2.x and so on. Here's the comment I was referring to: /* * InitProcGlobal - *initializes the global process table. We put it here so that *the postmaster can do this initialization. * *We also create all the per-process semaphores we will need to support *the requested number of backends. We used to allocate semaphores *only when backends were actually started up, but that is bad because *it lets Postgres fail under load --- a lot of Unix systems are *(mis)configured with small limits on the number of semaphores, and *running out when trying to start another backend is a common failure. *So, now we grab enough semaphores to support the desired max number *of backends immediately at initialization --- if the sysadmin has set *MaxBackends higher than his kernel will support, he'll find out sooner *rather than later. * *Another reason for creating semaphores here is that the semaphore *implementation typically requires us to create semaphores in the *postmaster, not in backends. */ Merlin ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] new procedural language - PL/R
Tom Lane wrote: Joe Conway [EMAIL PROTECTED] writes: 2) Knowing the trend to move stuff *out* of the PostgreSQL source tarball, and assuming plr is released under GPL, is there any chance that it would be accepted into src/pl or contrib, or should I start a gborg project (I'd prefer if it could at least live in contrib)? I think we'd have to insist on gborg. The only reason there are any non-BSD-license items left in contrib is that I haven't finished my TODO item to get their licenses changed or remove 'em. [...snip...] BSD would be good. I agree that it'll be a pain in the neck to maintain a PL that is not in the main tree, so I'd support accepting it if we can get the license right. I finally got a response from one of the core R developers: libR is GPL-ed, and that is unlikely to change -- so I guess gborg it is :-( (not that I have anything against gborg ;-)) Before making any release announcements, I'd be interested in feedback if anyone feels so inclined. The source is currently available here: http://www.joeconway.com/plr/plr.0.1.1.alpha.tar.gz The documentation, including preprocessed html, is in the tar ball. I've also posted the html docs here: http://www.joeconway.com/plr/index.html From the README (more or less): --- Installation: Place tar file in 'contrib' in the PostgreSQL source tree and untar. Then run: make make install make installcheck You can use plr.sql to create the language and functions in your database of choice, e.g. psql mydatabase plr.sql --- In addition to the documentation, the plr.out file in plr/expected is a good source of usage examples. PL/R should build cleanly with PostgreSQL 7.3.x and cvs HEAD. It was developed using libR from R 1.6.2 under Red Hat 7.3 8.0 -- I've not tested against other versions of R or different OSes. Please let me know how it goes. Thanks, Joe ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
[HACKERS] win32 port
Has a final decision been made if the win32 port is going to be threaded or not? Merlin
Re: [HACKERS] Changing the default configuration (was Re: [pgsql-advocacy]
On Tuesday 11 February 2003 13:03, Robert Treat wrote: On Tue, 2003-02-11 at 12:08, Justin Clift wrote: b) Said benchmarking person knows very little about PostgreSQL, so they install the RPM's, packages, or whatever, and it works. Then they run whatever benchmark they've downloaded, or designed, or whatever Out of curiosity, how feasible is it for the rpm/package/deb/exe maintainers to modify their supplied postgresql.conf settings when building said distribution? AFAIK the minimum default SHHMAX setting on Red Hat 8.0 is 32MB, seems like bumping shared buffers to work with that amount would be acceptable inside the 8.0 rpm's. Yes, this is easy to do. But what is a sane default? I can patch any file I'd like to, but my preference is to patch as little as possible, as I'm trying to be generic here. I can't assume Red Hat 8 in the source RPM, and my binaries are to be preferred only if the distributor doesn't have updated ones. -- Lamar Owen WGCR Internet Radio 1 Peter 4:11 ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
[HACKERS] Report on Japan, Belgium trip
I have just returned from a two-week speaking trip to Japan and Belgium. I spoke at three locations in Tokyo, and at FOSDEM in Brussels. I have updated my home page to list all the speeches, including one on replication and a new marketing one: http://candle.pha.pa.us/main/writings/computer.html I had about 20-150 people at each presentation. A had very few questions about PostgreSQL vs. MySQL --- seems this question has been answered in our favor. I did get lots of questions about PostgreSQL vs. Oracle, which is the right question. ;-) FOSDEM was very successful. I think we will do it again next year. Actually, I would like to expand our geographic coverage. We have O'Reilly for North America, and FOSDEM for Europe. It would be good to have something in Asia, Australia, and perhaps Russia and Canada. Ideally, we can provide 6-12 hours of content as part of an event that has other presentations that would be of interest to PostgreSQL folks. FOSDEM and O'Reilly meet these criteria. If you know of such an event, please let me know or contact the event organizers and ask if they want PostgreSQL as part of their event. Let me share one question I got often during my trip, Why should I use PostgreSQL? In every case, my answer was that you should use PostgreSQL because we have amassed a group of developers and committed users that no other company can match. The open source development model allows us to improve PostgreSQL at an amazing rate. If you look at where we were seven year ago, and where we are now, it is impossible to overestimate how advanced we will be seven years from now. So, I said, don't choose PostgreSQL only for where it is now, but for where it is going. I am home for ten days, then head to China and Japan for another two weeks. Those visits are not to attend conferences, but to meet with companies using PostgreSQL. I am taking my ten-year-old son with me. I will start catching up on my email now. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
[HACKERS] Maximum Size for Large Object / TOASTed Object
Hi All, What is the maximum size a large object can be in PostgreSQL? What is the maximum size a TOASTed object can be in PostgreSQL? The PostgreSQL Limitations page says the maximum size of a field is 1 Gb, but does a large object constitute a field? :) Thanks, Paul -- __ / | Paul Ramsey | Refractions Research | Email: [EMAIL PROTECTED] | Phone: (250) 885-0632 \_ ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Maximum Size for Large Object / TOASTed Object
On Tue, 11 Feb 2003, Paul Ramsey wrote: Hi All, What is the maximum size a large object can be in PostgreSQL? What is the maximum size a TOASTed object can be in PostgreSQL? The PostgreSQL Limitations page says the maximum size of a field is 1 Gb, but does a large object constitute a field? :) Thanks, Paul I don't know but large objects are stored in the filesystem so I presume any limit is going to apply there. A large object isn't a field, the large object id can, and very probably should, be stored in one though. -- Nigel J. Andrews ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Changing the default configuration (was Re: [pgsql-advocacy]
On Tue, 2003-02-11 at 10:20, Tom Lane wrote: Merlin Moncure [EMAIL PROTECTED] writes: May I make a suggestion that maybe it is time to start thinking about tuning the default config file, IMHO its just a little bit too conservative, It's a lot too conservative. I've been thinking for awhile that we should adjust the defaults. The original motivation for setting shared_buffers = 64 was so that Postgres would start out-of-the-box on machines where SHMMAX is 1 meg (64 buffers = 1/2 meg, leaving 1/2 meg for our other shared data structures). At one time SHMMAX=1M was a pretty common stock kernel setting. But our other data structures blew past the 1/2 meg mark some time ago; at default settings the shmem request is now close to 1.5 meg. So people with SHMMAX=1M have already got to twiddle their postgresql.conf settings, or preferably learn how to increase SHMMAX. That means there is *no* defensible reason anymore for defaulting to 64 buffers. We could retarget to try to stay under SHMMAX=4M, which I think is the next boundary that's significant in terms of real-world platforms (isn't that the default SHMMAX on some BSDen?). That would allow us 350 or so shared_buffers, which is better, but still not really a serious choice for production work. What I would really like to do is set the default shared_buffers to 1000. That would be 8 meg worth of shared buffer space. Coupled with more-realistic settings for FSM size, we'd probably be talking a shared memory request approaching 16 meg. This is not enough RAM to bother any modern machine from a performance standpoint, but there are probably quite a few platforms out there that would need an increase in their stock SHMMAX kernel setting before they'd take it. So what this comes down to is making it harder for people to get Postgres running for the first time, versus making it more likely that they'll see decent performance when they do get it running. It's worth noting that increasing SHMMAX is not nearly as painful as it was back when these decisions were taken. Most people have moved to platforms where it doesn't even take a kernel rebuild, and we've acquired documentation that tells how to do it on all(?) our supported platforms. So I think it might be okay to expect people to do it. The alternative approach is to leave the settings where they are, and to try to put more emphasis in the documentation on the fact that the factory-default settings produce a toy configuration that you *must* adjust upward for decent performance. But we've not had a lot of success spreading that word, I think. With SHMMMAX too small, you do at least get a pretty specific error message telling you so. Comments? I'd personally rather have people stumble trying to get PostgreSQL running, up front, rather than allowing the lowest common denominator more easily run PostgreSQL only to be disappointed with it and move on. After it's all said and done, I would rather someone simply say, it's beyond my skill set, and attempt to get help or walk away. That seems better than them being able to run it and say, it's a dog, spreading word-of-mouth as such after they left PostgreSQL behind. Worse yet, those that do walk away and claim it performs horribly are probably doing more harm to the PostgreSQL community than expecting someone to be able to install software ever can. Nutshell: Easy to install but is horribly slow. or Took a couple of minutes to configure and it rocks! Seems fairly cut-n-dry to me. ;) The type of person who can't configure it or doesnt' think to try is probably not doing a project that requires any serious performance. As long as you are running it on decent hardware postgres will run fantastic for anything but a very heavy load. I think there may be many people out there who have little experience but want an RDBMS to manage their data. Those people need something very, very easy. Look at the following that mysql gets despite how poor of a product it is. It's very, very easy. Mysql works great for many peoples needs but then when they need to do something real they need to move to a different database entirely. I think there is a huge advantage to having a product that can be set up very quickly out of the box. Those who need serious performance, hopefully for ther employers sake, will be more like to take a few minutes to do some quick performance tuning. Rick Gigger ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [HACKERS] PGP signing release
On Wed, 11 Feb 2003, Greg Copeland wrote: On Wed, 2003-02-05 at 18:53, Curt Sampson wrote: [Re: everybody sharing a single key] This issue doesn't change regardless of the mechanism you pick. Anyone that is signing a key must take reasonable measures to ensure the protection of their key. Right. Which is why you really want to use separate keys: you can determine who compromised a key if it is compromised, and you can revoke one without having to revoke all of them. Which pretty much inevitably leads you to just having the developers use their own personal keys to sign the release. Basically, you are saying: You trust a core developer You trust they can protect their keys You trust they can properly distribute their trust You don't trust a core developer with a key Not at all. I trust core developers with keys, but I see no reason to weaken the entire system by sharing keys when it's not necessary. Having each developer sign the release with his own personal key solves every problem you've brought up. cjs -- Curt Sampson [EMAIL PROTECTED] +81 90 7737 2974 http://www.netbsd.org Don't you know, in this new Dark Age, we're all light. --XTC ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Changing the default configuration (was Re:
On Tue, 11 Feb 2003, Merlin Moncure wrote: Here's the comment I was referring to: /* * InitProcGlobal - * initializes the global process table. We put it here so that * the postmaster can do this initialization. * * We also create all the per-process semaphores we will need to support * the requested number of backends. We used to allocate semaphores * only when backends were actually started up, but that is bad because * it lets Postgres fail under load --- a lot of Unix systems are * (mis)configured with small limits on the number of semaphores, and * running out when trying to start another backend is a common failure. * So, now we grab enough semaphores to support the desired max number * of backends immediately at initialization --- if the sysadmin has set * MaxBackends higher than his kernel will support, he'll find out sooner * rather than later. * * Another reason for creating semaphores here is that the semaphore * implementation typically requires us to create semaphores in the * postmaster, not in backends. */ Interesting. I was looking at the max number of file handles, but not semaphores. I don't have to adjust the sem settings until I break 2047 connections, about 10 times what I want to set the default to. With max connections set to 200 and buffers set to 1000, I pretty much can't run out of system resources on my box, only postgresql resources. My box running RH72 has about 6500 free file handles out of the default 8192 left when running 200 simo transactions with pgbench, and the 200 max connects setting would account for about 10% of the shared semaphore max on the box, well under what I'd worry about. So, what OSes would have a problem, if any, with boosting something like max connects to 200? What are the breaking points on other OSes? I just downloaded FreeBSD 5.0 yesterday, so I'll try to get it installed and tested as well. ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] Changing the default configuration (was Re: [pgsql-advocacy]
On Tue, 11 Feb 2003, Tom Lane wrote: It's a lot too conservative. I've been thinking for awhile that we should adjust the defaults. Some of these issues could be made to Just Go Away with some code changes. For example, using mmap rather than SysV shared memory would automatically optimize your memory usage, and get rid of the double-buffering problem as well. If we could find a way to avoid using semephores proportional to the number of connections we have, then you wouldn't have to worry about that configuration parameter, either. In fact, some of this stuff might well improve our portability, too. For example, mmap is a POSIX standard, whereas shmget is only an X/Open standard. That makes me suspect that mmap is more widely available on non-Unix platforms. (But I could be wrong.) cjs -- Curt Sampson [EMAIL PROTECTED] +81 90 7737 2974 http://www.netbsd.org Don't you know, in this new Dark Age, we're all light. --XTC ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] PGP signing release
On Tue, 2003-02-11 at 18:27, Curt Sampson wrote: On Wed, 11 Feb 2003, Greg Copeland wrote: On Wed, 2003-02-05 at 18:53, Curt Sampson wrote: [Re: everybody sharing a single key] This issue doesn't change regardless of the mechanism you pick. Anyone that is signing a key must take reasonable measures to ensure the protection of their key. Right. Which is why you really want to use separate keys: you can determine who compromised a key if it is compromised, and you can revoke one without having to revoke all of them. Which pretty much inevitably leads you to just having the developers use their own personal keys to sign the release. Basically, you are saying: You trust a core developer You trust they can protect their keys You trust they can properly distribute their trust You don't trust a core developer with a key Not at all. I trust core developers with keys, but I see no reason to weaken the entire system by sharing keys when it's not necessary. Having each developer sign the release with his own personal key solves every problem you've brought up. cjs You need to keep in mind, I've not been advocating, rather, clarifying. The point being, having a shared key between trusted core developers is hardly an additional risk. After all, either they can be trusted or they can't. At this point, I think we both understand where the other stands. Either we agree or agree to disagree. The next step is for the developers to adopt which path they prefer to enforce and to ensure they have the tools and knowledge at hand to support it. Anyone know if Tom and Bruce know each other well enough to sign each other's keys outright, via phone, via phone and snail-mail? That would put us off to an excellent start. Regards, -- Greg Copeland [EMAIL PROTECTED] Copeland Computer Consulting ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Changing the default configuration (was Re: [pgsql-advocacy]
On Tue, 11 Feb 2003, Rick Gigger wrote: The type of person who can't configure it or doesnt' think to try is probably not doing a project that requires any serious performance. As long as you are running it on decent hardware postgres will run fantastic for anything but a very heavy load. I think there may be many people out there who have little experience but want an RDBMS to manage their data. Those people need something very, very easy. Look at the following that mysql gets despite how poor of a product it is. It's very, very easy. Mysql works great for many peoples needs but then when they need to do something real they need to move to a different database entirely. I think there is a huge advantage to having a product that can be set up very quickly out of the box. Those who need serious performance, hopefully for ther employers sake, will be more like to take a few minutes to do some quick performance tuning. Very good point. I'm pushing for changes that will NOT negatively impact joe beginner on the major platforms (Linux, BSD, Windows) in terms of install. I figure anyone installing on big iron already knows enough about their OS we don't have to worry about shared buffers being too big for that machine. So, a compromise of faster performance out of the box, with little or no negative user impact seems the sweet spot here. I'm thinking a good knee setting for each one, where not too much memory / semaphores / file handles get gobbled up, but the database isn't pokey. The poor performance of Postgresql in it's current default configuration HAS cost us users, trust me, I know a few we've almost lost where I work that I converted after some quick tweaking of their database. In it's stock form Postgresql is very slow at large simple queries, like 'select * from table1 t1 natural join table2 t2 where t1.field='a'; where you get back something like 10,000 rows. The real bottleneck here is sort_mem. A simple bump up to 8192 or so makes the database much more responsive. If we're looking at changing default settings for 7.4, then we should look at changing ALL of them that matter, since we'll have the most time to shake out problems if we do them early, and we won't have four or five rounds of setting different defaults over time and finding the limitations of the HOST OSes one at a time. ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [HACKERS] Changing the default configuration
My other pet peeve is the default max connections setting. This should be higher if possible, but of course, there's always the possibility of running out of file descriptors. Apache has a default max children of 150, and if using PHP or another language that runs as an apache module, it is quite possible to use up all the pgsql backend slots before using up all the apache child slots. Is setting the max connections to something like 200 reasonable, or likely to cause too many problems? It likely. First you will ran out kernel file descriptors. This could be solved by increasing the kernel table or lowering max_files_per_process, though. Second the total throughput will rapidly descrease if you don't have enough RAM and many CPUs. PostgreSQL can not handle many concurrent connections/transactions effectively. I recommend to employ some kind of connection pooling software and lower the max connections. -- Tatsuo Ishii ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] Changing the default configuration (was Re: [pgsql-advocacy]
On Wed, 12 Feb 2003, Curt Sampson wrote: On Tue, 11 Feb 2003, Tom Lane wrote: It's a lot too conservative. I've been thinking for awhile that we should adjust the defaults. Some of these issues could be made to Just Go Away with some code changes. For example, using mmap rather than SysV shared memory would automatically optimize your memory usage, and get rid of the double-buffering problem as well. If we could find a way to avoid using semephores proportional to the number of connections we have, then you wouldn't have to worry about that configuration parameter, either. In fact, some of this stuff might well improve our portability, too. For example, mmap is a POSIX standard, whereas shmget is only an X/Open standard. That makes me suspect that mmap is more widely available on non-Unix platforms. (But I could be wrong.) I'll vote for mmap. I use the mm libs with apache/openldap/authldap and it is very fast and pretty common nowadays. It seems quite stable as well. ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Changing the default configuration (was Re: [pgsql-advocacy]
On Tue, Feb 11, 2003 at 17:42:06 -0700, scott.marlowe [EMAIL PROTECTED] wrote: The poor performance of Postgresql in it's current default configuration HAS cost us users, trust me, I know a few we've almost lost where I work that I converted after some quick tweaking of their database. About two years ago I talked some people into trying it at work to use with IMP/Horde which had been having some corruption problems while using MySQL (though it wasn't necessarily a problem with MySQL). I told them to be sure to use 7.1. When they tried it out it couldn't keep up with the load. I asked the guys what they tried and found out they couldn't find 7.1 rpms and didn't want to compile from source and so ended up using 7.0.?. Also as far as I could tell from talking to them, they didn't do any tuning at all. They weren't interested in taking another look at it after that. We are still using MySQL with that system today. One of our DBAs is using it for some trial projects (including one for me) even though we have a site license for Oracle. ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Changing the default configuration
On Wed, 12 Feb 2003, Tatsuo Ishii wrote: My other pet peeve is the default max connections setting. This should be higher if possible, but of course, there's always the possibility of running out of file descriptors. Apache has a default max children of 150, and if using PHP or another language that runs as an apache module, it is quite possible to use up all the pgsql backend slots before using up all the apache child slots. Is setting the max connections to something like 200 reasonable, or likely to cause too many problems? It likely. First you will ran out kernel file descriptors. This could be solved by increasing the kernel table or lowering max_files_per_process, though. Second the total throughput will rapidly descrease if you don't have enough RAM and many CPUs. PostgreSQL can not handle many concurrent connections/transactions effectively. I recommend to employ some kind of connection pooling software and lower the max connections. Don't know if you saw my other message, but increasing max connects to 200 used about 10% of all my semaphores and about 10% of my file handles. That was while running pgbench to create 200 simo sessions. Keep in mind, on my fairly small intranet database server, I routinely have 32 connections, most coming from outside my webserver. Probably no more than 4 or 5 connects at a time come from there. These are all things like Windows boxes with ODBC running access or something similar. Many of the connections are idle 98% of the time, and use little or no real resources, even getting swapped out should the server need the spare memory (it doesn't :-) that machine is set to 120 max simos if I remember correctly. while 200 may seem high, 32 definitely seems low. So, what IS a good compromise? for this and ALL the other settings that should probably be a bit higher. I'm guessing sort_mem or 4 or 8 meg hits the knee for most folks, and the max fsm settings tom has suggested make sense. What wal_sync method should we make default? Or should we pick one based on the OS the user is running? ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
[HACKERS] Contract Programmer Advice.
Being that this group of hackers is one I trust, and that this is a pretty common scenario for contract programming, I thought I'd ask this group a question. I hope you don't mind. I did some contract programming work for a to-be-nameless company a while back, and I'm having difficulty collecting. The amount of the debt is around $1,400 US, so it's not something I am willing to just let go. However, I've been patient for nearly two years: and, despite repeated promises to the contrary, I have yet to see this money. They did pay me for the first $1,000 or so, but have that $1,400 left to pay (I did a total of about $2,400 for them). It was a subcontracting arrangement; they were contracting out to a client, and I was subcontracted. According to the client, the bill was paid in full. This company doesn't dispute any of my invoices and says they are going to pay me. But they have not yet done so. This company is still in business, and seems to be doing quite well. Don't even ask their name; I'm not out to smear anyone. Although I am tempted to publicly announce who they are, it is against my nature. But my patience is wearing thin; and I'm willing to try just about anything. Short of libel or slander, of course. Whether they are honest or not doesn't mean I have carte blanche to be dishonest. I just want my money _soon_. This company has had no complaints about my performance or the quality of my work in this; in fact, in the communications I've had with them, they were always pleased with my work (or at least that's what they said). And, of course, I kept copies of all correspondence. In fact, I would do work for them again if I could be more sure of collecting the money. Maybe I'm just too patient; I don't know. Since I do trust this group, if any of you have had experience dealing with credit collection bureaus and the like, can you please e-mail me privately with your experience? -- Lamar Owen WGCR Internet Radio 1 Peter 4:11 ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Changing the default configuration
It likely. First you will ran out kernel file descriptors. This could be solved by increasing the kernel table or lowering max_files_per_process, though. Second the total throughput will rapidly descrease if you don't have enough RAM and many CPUs. PostgreSQL can not handle many concurrent connections/transactions effectively. I recommend to employ some kind of connection pooling software and lower the max connections. Don't know if you saw my other message, but increasing max connects to 200 used about 10% of all my semaphores and about 10% of my file handles. That was while running pgbench to create 200 simo sessions. I'm not talking about semaphores. You see the low usage of file descriptors is just because pgbench uses very few tables. Keep in mind, on my fairly small intranet database server, I routinely have 32 connections, most coming from outside my webserver. Probably no more than 4 or 5 connects at a time come from there. These are all things like Windows boxes with ODBC running access or something similar. Many of the connections are idle 98% of the time, and use little or no real resources, even getting swapped out should the server need the spare memory (it doesn't :-) that machine is set to 120 max simos if I remember correctly. while 200 may seem high, 32 definitely seems low. So, what IS a good compromise? for this and ALL the other settings that should probably be a bit higher. I'm guessing sort_mem or 4 or 8 meg hits the knee for most folks, and the max fsm settings tom has suggested make sense. 32 is not too low if the kernel file descriptors is not increased. Beware that running out of the kernel file descriptors is a serious problem for the entire system, not only for PostgreSQL. What wal_sync method should we make default? Or should we pick one based on the OS the user is running? It's really depending on the OS or kernel version. I saw open_sync is best for certain version of Linux kernel, while fdatasync is good for another version of kernel. I'm not sure, but it could be possible that the file system type might affect the wal_sync choice. -- Tatsuo Ishii ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [pgsql-advocacy] [HACKERS] PostgreSQL Benchmarks
Why don't we include a postgresql.conf.recommended along with our postgresql.conf.sample. That shouldn't be too hard. We can just jack up the shared buffers and wal buffers and everything - it doesn't matter if it's not perfect, but it will at least give people an idea of what needs to be increased, etc to get good results. I'm currently benchmarking our new DB server before we put it into production. I plan to publish the results from that shortly. Regards, Chris -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On Behalf Of Merlin Moncure Sent: Tuesday, 11 February 2003 11:44 PM To: Greg Copeland Cc: PostgresSQL Hackers Mailing List; [EMAIL PROTECTED] Subject: Re: [pgsql-advocacy] [HACKERS] PostgreSQL Benchmarks I've tested all the win32 versions of postgres I can get my hands on (cygwin and not), and my general feeling is that they have problems with insert performance with fsync() turned on, probably the fault of the os. Select performance is not so much affected. This is easily solved with transactions and other such things. Also Postgres benefits from pl just like oracle. May I make a suggestion that maybe it is time to start thinking about tuning the default config file, IMHO its just a little bit too conservative, and its hurting you in benchmarks being run by idiots, but its still bad publicity. Any real database admin would know his test are synthetic and not meaningful without having to look at the #s. This is irritating me so much that I am going to put together a benchmark of my own, a real world one, on (publicly available) real world data. Mysql is a real dog in a lot of situations. The FCC publishes a database of wireless transmitters that has tables with 10 million records in it. I'll pump that into pg, run some benchmarks, real world queries, and we'll see who the faster database *really* is. This is just a publicity issue, that's all. Its still annoying though. I'll even run an open challenge to database admin to beat query performance of postgres in such datasets, complex multi table joins, etc. I'll even throw out the whole table locking issue and analyze single user performance. Merlin _ How much of the performance difference is from the RDBMS, from the middleware, and from the quality of implementation in the middleware. While I'm not surprised that the the cygwin version of PostgreSQL is slow, those results don't tell me anything about the quality of the middleware interface between PHP and PostgreSQL. Does anyone know if we can rule out some of the performance loss by pinning it to bad middleware implementation for PostgreSQL? Regards, -- Greg Copeland [EMAIL PROTECTED] Copeland Computer Consulting ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Changing the default configuration (was Re: [pgsql-advocacy]
After it's all said and done, I would rather someone simply say, it's beyond my skill set, and attempt to get help or walk away. That seems better than them being able to run it and say, it's a dog, spreading word-of-mouth as such after they left PostgreSQL behind. Worse yet, those that do walk away and claim it performs horribly are probably doing more harm to the PostgreSQL community than expecting someone to be able to install software ever can. RANT And that my friends is why PostgreSQL is still relatively obscure. Dude - I hang out on PHPBuilder's database forums and you wouldn't believe how often the oh, don't use Postgres, it has a history of database corruption problems thing is mentioned. Chris ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Contract Programmer Advice.
On Tue, Feb 11, 2003 at 08:56:56PM -0500, Lamar Owen wrote: This company doesn't dispute any of my invoices and says they are going to pay me. But they have not yet done so. This company is still in business, and I wouldn't know about your country, whichever that may be, but in the Netherlands it only takes two unpaid creditors to request bankruptcy for a company with a judge. Bankruptcy is declared when a company no longer pays its dues (note that ability doesn't come into this), and two outstanding debts are the minimum required to establish that. If all else fails, that kind of procedure gives a very strong incentive to pay up. Your country may have a similar arrangement. Jeroen ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Contract Programmer Advice.
In Canada we have small claims court. up to 10,1000$ and it only costs you 50$ to file a claim. They have to file a defense or settle within 30 days. Usally if they owe you the money it forces them to do something, either settle or *really* drag it out, but it gets the process moving. Jeroen T. Vermeulen wrote: On Tue, Feb 11, 2003 at 08:56:56PM -0500, Lamar Owen wrote: This company doesn't dispute any of my invoices and says they are going to pay me. But they have not yet done so. This company is still in business, and I wouldn't know about your country, whichever that may be, but in the Netherlands it only takes two unpaid creditors to request bankruptcy for a company with a judge. Bankruptcy is declared when a company no longer pays its dues (note that ability doesn't come into this), and two outstanding debts are the minimum required to establish that. If all else fails, that kind of procedure gives a very strong incentive to pay up. Your country may have a similar arrangement. Jeroen ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
[HACKERS] Heads up: Telegraph research project at UC Berkeley
Hi all: I wanted to give the PostgreSQL community a heads up on ongoing database research here at UC Berkeley, which may be of interest since it uses PGSQL. The last few years we've been building a system called Telegraph, which has a number of research thrusts: a) aggressively adaptive query optimization (based on the eddy concept) b) queries over external data sources, including pull (e.g. by scraping web pages) and push (e.g. data feeds from sensors) c) support for Continuous Queries over streaming push sources, including multi-query optimization and fault-tolerant parallelism The first version of the system was written from scratch in Java, which caused us some pain About 8 months ago we abandoned our initial version of the system, and decided to start over in C. Rather than starting over from scratch, we decided to take PostgreSQL and enhance it (and also disembowel it somewhat) for our purposes. We're planning on releasing an alpha of the new system, called TelegraphCQ, in a couple weeks. It will include some but not all of the above features, and we'd be happy to have folks kick the tires a little. It will be interesting to see how/if our modifications can be of use to the broader PGSQL community. I encourage interested folks to have a look at our website at http://telegraph.cs.berkeley.edu, read some of the papers, etc. The best overview is our recent CIDR paper, http://www.cs.berkeley.edu/~franklin/Papers/TCQcidr03.pdf (Please see the related note on how we're now using PostgreSQL in our database classes both at Berkeley and CMU.) Regards, Joe Hellerstein -- Joseph M. Hellerstein Professor, EECS Computer Science Division UC Berkeley http://www.cs.berkeley.edu/~jmh ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Berkeley and CMU classes adopt/extend PostgreSQL
Joe Hellerstein [EMAIL PROTECTED] writes: I am teaching the undergrad DB course at UC Berkeley, something I do with some frequency. We have the usual 180 students we get every semester (yep: 180!), but this year we've instituted 2 changes: 1) We changed the course projects to make the students hack PostgreSQL internals, rather than the minibase eduware Cool. 2) We are coordinating the class with a class at CMU being taught by Prof. Anastassia (Natassa) Ailamaki Double cool. I'm just down the road, if Natassa needs a visiting lecturer. Our Homework 2, which is being passed out this week, will ask the students to implement a hash-based grouping that spills to disk. I understand this topic has been batted about the pgsql-hackers list recently. Yes. As of CVS tip, we have hash-based grouping but it doesn't spill to disk. Want to ask them to start from CVS tip and fix that little detail? Or fix the various other loose ends that have been mentioned lately? (make it work with DISTINCT, improve the estimation logic, some other things I'm forgetting) I'm hopeful this will lead to many good things: Yes, let's see what we can do with this ... seems like Postgres may be coming full circle ;-) regards, tom lane ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Changing the default configuration (was Re:
scott.marlowe [EMAIL PROTECTED] writes: ... The original choice of 32 was set because the original choice of 64 shared memory blocks as the most we could hope for on common OS installs. Now that we're looking at cranking that up to 1000, shouldn't max connections get a look too? Actually I think max-connections at 32 was set because of SEMMAX limits, and had only the most marginal connection to shared_buffers (anyone care to troll the archives to check?) But sure, let's take another look at the realistic limits today. ... If he starts running out of semaphores, that's a problem he can address while his database is still up and running in most operating systems, at least in the ones I use. Back in the day, this took a kernel rebuild and system reboot to fix. If this has changed, great ... but on exactly which Unixen can you alter SEMMAX on the fly? So, my main point is that any setting that requires you to shut down postgresql to make the change, we should pick a compromise value that means you never likely will have to shut down the database once you've started it up and it's under load. When I started using Postgres, it did not allocate the max number of semas it might need at startup, but was instead prone to fail when you tried to open the 17th or 33rd or so connection. It was universally agreed to be an improvement to refuse to start at all if we could not meet the specified max_connections setting. I don't want to backtrack from that. If we can up the default max_connections setting, great ... but let's not increase the odds of failing under load. regards, tom lane ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [HACKERS] location of the configuration files
I, personally, also think it makes more sense to pass to the postmaster a configuration file that contains all the rest of the information about the database system, including the disk locations of the various data directories and whatnot. cjs -- Curt Sampson [EMAIL PROTECTED] +81 90 7737 2974 http://www.netbsd.org Don't you know, in this new Dark Age, we're all light. --XTC ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
[HACKERS] PostgreSQL Tuning Results
Hi Everyone, I have just completed a basic set of benchmarking on our new database server. I wanted to figure out a good value for shared_buffers before we go live. We are a busy ecommerce-style website and so we probably get 10 or 20 to 1 read transactions vs. write transactions. We also don't have particularly large tables. Attached are the charts for select only and tpc-b runs. Also attached is an OpenOffice.org spreadsheet with all the results, averages and charts. I place all these attachments in the public domain, so you guys can use them how you wish. I installed pgbench, and set up a pgbench database with scale factor 1. I then set shared_buffers to all the values between 2000 and 11000 and tested select and tcp-b with each. I ran each test 3 times and averaged the values. TPC-B was run after select so had advantages due to the buffers already being filled, but I was consistent with this. Machine: 256MB RAM, FreeBSD 4.7, EIDE HDD, 1 Ghz TPC-B config: pgbench -c 64 -t 100 pgbench (Note: only 64 users here) SELECT config: pgbench -c 128 -t 100 -S pgbench (Note: full 128 users here) I'm not sure why 8000 and 9000 are low on tpc-b, it's odd. Anyway, from the attached results you can see that 4000 buffers gave the best SELECT only performance, whereas the TPC-B stuff seemed to max out way up at 1 or so. Since there is a 20% gain in performance on TPC-B going from 4000 buffers to 5000 buffers and only a 2% loss in performance for SELECTs, I have configured my server to use 5000 shared buffers, eg. 45MB RAM. I am now going to leave it on 5000 and play with wal_buffers. Is there anything else people are interested in me trying? Later on, I'll run pg_autotune to see how its recommendation matches my findings. Chris attachment: pg_select.gifattachment: pg_tpcb.gif PostgreSQL Benchmark.sxc Description: OpenOffice Calc spreadsheet ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] PostgreSQL Tuning Results
-Original Message- From: Christopher Kings-Lynne [mailto:[EMAIL PROTECTED]] Sent: Tuesday, February 11, 2003 8:54 PM To: Hackers; Advocacy Subject: [HACKERS] PostgreSQL Tuning Results Hi Everyone, I have just completed a basic set of benchmarking on our new database server. I wanted to figure out a good value for shared_buffers before we go live. We are a busy ecommerce-style website and so we probably get 10 or 20 to 1 read transactions vs. write transactions. We also don't have particularly large tables. Attached are the charts for select only and tpc-b runs. Also attached is an OpenOffice.org spreadsheet with all the results, averages and charts. I place all these attachments in the public domain, so you guys can use them how you wish. I installed pgbench, and set up a pgbench database with scale factor 1. I then set shared_buffers to all the values between 2000 and 11000 and tested select and tcp-b with each. I ran each test 3 times and averaged the values. TPC-B was run after select so had advantages due to the buffers already being filled, but I was consistent with this. Machine: 256MB RAM, FreeBSD 4.7, EIDE HDD, 1 Ghz TPC-B config: pgbench -c 64 -t 100 pgbench (Note: only 64 users here) SELECT config: pgbench -c 128 -t 100 -S pgbench (Note: full 128 users here) I'm not sure why 8000 and 9000 are low on tpc-b, it's odd. Anyway, from the attached results you can see that 4000 buffers gave the best SELECT only performance, whereas the TPC-B stuff seemed to max out way up at 1 or so. Since there is a 20% gain in performance on TPC-B going from 4000 buffers to 5000 buffers and only a 2% loss in performance for SELECTs, I have configured my server to use 5000 shared buffers, eg. 45MB RAM. I am now going to leave it on 5000 and play with wal_buffers. Is there anything else people are interested in me trying? Keenly interested. Who wouldn't want to know how to optimize it? That's the hardest guideline to find. Later on, I'll run pg_autotune to see how its recommendation matches my findings. I would like to hear about that also. Please report on it. ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [pgsql-advocacy] [HACKERS] PostgreSQL Tuning Results
I am now going to leave it on 5000 and play with wal_buffers. Is there anything else people are interested in me trying? Keenly interested. Who wouldn't want to know how to optimize it? That's the hardest guideline to find. Oops - what that sentence was supposed to say is Is there anyone else interested in me trying any other variables? What I don't really know is what is actually affected by wal_buffers? I assume my select only tests won't even touch the WAL, so I guess I have to just play with tpc-b. Chris ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] PostgreSQL Tuning Results
Hi Chris, On Wed, 12 Feb 2003, Christopher Kings-Lynne wrote: Machine: 256MB RAM, FreeBSD 4.7, EIDE HDD, 1 Ghz Seems like a small amount of memory to be memory based tests with. What about testing sort_mem as well. It would system to me that there would be no negative to having infinite sort_mem given infinite memory, though. Gavin ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] PostgreSQL Tuning Results
Machine: 256MB RAM, FreeBSD 4.7, EIDE HDD, 1 Ghz Seems like a small amount of memory to be memory based tests with. Perhaps, but I'm benchmarking for that machine, not for any other. The results have to include the 256MB spec. Also, the peak was 25MB of SHM, which still leave 231MB for the rest of the system, so surely RAM is not the bottleneck here? What about testing sort_mem as well. It would system to me that there would be no negative to having infinite sort_mem given infinite memory, though. Yeah, however I'm pretty sure that pgbench doesn't perform any sorts. I reckon that sort_mem is the hardest thing to optimise1 Chris ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [HACKERS] Projection while performing joins.
Anagh Lal [EMAIL PROTECTED] writes: ... I am still confused by the following: In /backend/executor/nodeMergeJoin.c in ExecMergeJoin() In the state (the switch case) EXEC_MJ_JOINTUPLES we still do ExecProject(), what does this do? Well, sure. A join node *must* do a projection, no? It can't simply return either the left or the right input tuple (except in the vanishingly-small fraction of cases where you don't actually need any columns from the right or the left respectively; which are cases that we don't currently bother to optimize). To create a tuple that's not exactly one or the other you must project. Some food for thought, Let's ignore the attributes listed in the select clause and work only with the where clause (join condition) attributes. And as a back reference store the tupleid of the original whole tuple in the working tuple. At the final output stage perform a lookup to retrieve the select clause attributes of only the qualifying tuple. Thus enabling us to work with really small sized data. worth trying out? Not sure there's a lot of traction here. In many cases, the bottom-level scan gets advanced one or more rows before the top-level nodes can pop out a result. (Consider GROUP BY as an example.) I don't see the advantage of adding bookkeeping/copying for past rows in order to avoid copying current-row data around. But feel free to prove me wrong ;-) regards, tom lane ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [pgsql-advocacy] [HACKERS] PostgreSQL Benchmarks
On Tuesday 11 Feb 2003 8:01 pm, Mario Weilguni wrote: Hrm. I just saw that the PHP ADODB guy just published a bunch of database benchmarks. It's fairly evident to me that benchmarking PostgreSQL on Win32 isn't really fair: http://php.weblogs.com/oracle_mysql_performance And why is the highly advocated transaction capable MySQL 4 not tested? That's the problem, for every performance test they choose ISAM tables, and when transactions are mentioned it's said MySQL has transactions. But why no benchmarks? I did benchmark mysql/postgresql/oracle sometime back. Mysql with transaction is 90% as fast as postgresql. But it dies down with increased number of users no matter how much resources you throw at it. Oracle is 130% of postgresql. This was postgresql 7.2.x series so things have changed for sure, but you got the idea, right? Shridhar ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] Changing the default configuration (was Re: [pgsql-advocacy]
On Tuesday 11 Feb 2003 10:56 pm, you wrote: Josh Berkus [EMAIL PROTECTED] writes: What if we supplied several sample .conf files, and let the user choose which to copy into the database directory? We could have a high read performance profile, and a transaction database profile, and a workstation profile, and a low impact profile. Uh ... do we have a basis for recommending any particular sets of parameters for these different scenarios? This could be a good idea in the abstract, but I'm not sure I know enough to fill in the details. Let's take very simple scenario to supply pre-configured postgresql.conf. Assume that SHMMAX=Total memory/2 and supply different config files for 64MB/128Mb/256MB/512MB and above. Is it simple enough? Shridhar ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Maximum Size for Large Object / TOASTed Object
Paul Ramsey [EMAIL PROTECTED] writes: What is the maximum size a large object can be in PostgreSQL? What is the maximum size a TOASTed object can be in PostgreSQL? TOASTed fields max out at 1Gb. Large objects max out at either 2 or 4 Gb ... I'm not sure whether we consistently use signed or unsigned arithmetic for them. (Probably they *should* max out at 4Gb, and if you find any places that fail over 2Gb, those are bugs for which patches will be gladly accepted.) regards, tom lane ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [pgsql-advocacy] [HACKERS] PostgreSQL Benchmarks
There's The Open Source Database Benchmark, http://osdb.sourceforge.net/. Anyone tried to use it? __ Do you Yahoo!? Yahoo! Shopping - Send Flowers for Valentine's Day http://shopping.yahoo.com ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Windows SHMMAX (was: Default configuration)
On Tue, 11 Feb 2003, Merlin Moncure wrote: Another way of looking at it is memory mapped files. This probably most closely resembles unix shared memory and is the de facto standard way for interprocess memory block sharing. Sadly, performance will suffer because you have to rely on the virtual memory system (think: writing to files) to do a lot of stupid stuff you don't necessarily want or need. To the contrary, for the majority of the shared memory usage of postgres, which is cached file data, the virtual memory system is doing exactly what you want it to: managing the movement of data between memory and disk, and caching the more frequently accessed data to reduce the chances you will actually need to access the disk for it. For shared memory used only for IPC, typically a VM system treats it no differently from any other non-shared memory, so if it's doing something you don't want or need (a proposition I quite heartily disagree with), it's going to be doing that very every piece of memory your application allocates and uses, shared or not. The OS has to guarantee that the memory can be swapped out to file at any time and therefore mirrors the pagefile to the allocated memory blocks. The OS does not need to write the pagefile. On modern Unix systems that are not allowing overcommit, the space will be allocated but never written unless there's a need to free up some physical memory, and the pages in question are used infrequently enough that the system decides that they are good candidates to be paged out. I would imagine that Windows does the same. cjs -- Curt Sampson [EMAIL PROTECTED] +81 90 7737 2974 http://www.netbsd.org Don't you know, in this new Dark Age, we're all light. --XTC ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] location of the configuration files
mlw wrote: AFAIK it wasn't actually done. It was more of a, we should do something different argument. At one point it was talked about rewriting the configuration system to allow include and other things. That seems like extreme overkill. The PostgreSQL configuration mechanism doesn't seem to me to be anywhere near complicated enough to justify an include mechanism. I agree with you: you should be able to specify all of the base configuration information (including the location of the data directories) in one file, and it makes perfect sense to me for the location of the data directory to be a GUC variable. I'd say the only thing the postmaster needs to know prior to startup is the directory containing the postgresql.conf file. An administrator who wishes to set up multiple independent databases can easily do so by using multiple config file directories. When consistency is required, he can easily use symlinks to point to master config files where appropriate. I assume $PGDATA was around long before GUC? -- Kevin Brown [EMAIL PROTECTED] ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [pgsql-advocacy] [HACKERS] PostgreSQL Benchmarks
-Original Message- From: ow [mailto:[EMAIL PROTECTED]] Sent: Tuesday, February 11, 2003 10:32 PM To: Shridhar Daithankar[EMAIL PROTECTED]; [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Subject: Re: [pgsql-advocacy] [HACKERS] PostgreSQL Benchmarks There's The Open Source Database Benchmark, http://osdb.sourceforge.net/. Anyone tried to use it? Requires a real Linux or UNIX installation. Won't run under Cygwin. At least I could not get it to work. Got it compiled, but I am missing something to make it work correctly. Here are some other benchmark things: http://www.eweek.com/article2/0,3959,293,00.asp Which has this link: ftp://ftp.eweek.com/pub/eweek/pdf/printpub/benchmark/dbbenchmark_v1.zip This article: http://www.dcc.unicamp.br/~celio/mc527/interbase/PCweek_test.html Has this code: ftp://ftp.zdnet.com/pcweek/labs/0207codefile.zip But it requires the very expensive Benchmark Factory product. It might be useful to people who already have benchmark factory. Another open source database test: http://sourceforge.net/projects/osdldbt Java database benchmark routines: http://www.firstsql.com/firstsqlj/ This site: http://www.mipt.sw.ru/ (which does not appear to be available right now) Has three excellent benchmark tools, ATS, LTS, OTS. They are Windows machine centric, though, and would probably be very hard to port to Unix environments. ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [pgsql-advocacy] [HACKERS] PostgreSQL Benchmarks
A financial database benchmark: http://www.cs.nyu.edu/cs/faculty/shasha/fintime.html ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [pgsql-advocacy] [HACKERS] PostgreSQL Benchmarks
This is how to get the FIPS benchmark. It measures CONFORMANCE rather than performance: http://www.itl.nist.gov/div897/ctg/sql_form.htm ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [pgsql-advocacy] [HACKERS] PostgreSQL Benchmarks
The benchmark handbook: http://www.benchmarkresources.com/handbook/contents.asp ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Hash grouping, aggregates
Tom Lane kirjutas T, 11.02.2003 kell 18:39: Bruno Wolff III [EMAIL PROTECTED] writes: Tom Lane [EMAIL PROTECTED] wrote: Greg Stark [EMAIL PROTECTED] writes: The neat thing is that hash aggregates would allow grouping on data types that have = operators but no useful operator. Hm. Right now I think that would barf on you, because the parser wants to find the '' operator to label the grouping column with, even if the planner later decides not to use it. It'd take some redesign of the query data structure (specifically SortClause/GroupClause) to avoid that. I think another issue is that for some = operators you still might not be able to use a hash. I would expect the discussion for hash joins in http://developer.postgresql.org/docs/postgres/xoper-optimization.html would to hash aggregates as well. Right, the = operator must be hashable or you're out of luck. But we could imagine tweaking the parser to allow GROUP BY if it finds a hashable = operator and no sort operator. The only objection I can see to this is that it means the planner *must* use hash aggregation, which might be a bad move if there are too many distinct groups. If we run out of sort memory, we can always bail out later, preferrably with a descriptive error message. It is not as elegant as erring out at parse (or even plan/optimise) time, but the result is /almost/ the same. Relying on hash aggregation will become essential if we are ever going to implement the other groupings (CUBE, ROLLUP, (), ...), so it would be nice if hash aggregation could also overflow to disk - I suspect that this will still be faster that running an independent scan for each GROUP BY grouping and merging the results. - Hannu ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] location of the configuration files
Kevin Brown [EMAIL PROTECTED] writes: I assume $PGDATA was around long before GUC? Yes, it was. But I have not yet seen an argument here that justifies why $SOMECONFIGDIRECTORY/postgresql.conf is better than $PGDATA/postgresql.conf. The latter keeps all the related files together. The former seems only to introduce unnecessary complexity. You can only justify it as simpler if you propose hardwiring a value for $SOMECONFIGDIRECTORY ... which is a proposal that will not fly with any of the core developers, because we all run multiple versions of Postgres on our machines so that we can deal with back-version bug reports, test installations, etc. It is unlikely to fly with any of the RPM packagers either, due to the wildly varying ideas out there about the One True Place where applications should put their config files. (This point was pretty much why mlw's previous proposal was rejected, IIRC.) regards, tom lane ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] PostgreSQL Tuning Results
Gavin Sherry wrote: Hi Chris, On Wed, 12 Feb 2003, Christopher Kings-Lynne wrote: Machine: 256MB RAM, FreeBSD 4.7, EIDE HDD, 1 Ghz Seems like a small amount of memory to be memory based tests with. What about testing sort_mem as well. It would system to me that there would be no negative to having infinite sort_mem given infinite memory, though. Gavin ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED] Be careful with sort_mem - this might lead to VERY unexpected results. I did some testing on my good old Athlon 500 with a brand new IBM 120 Gigs HDD. Reducing the sort_mem gave me significantly faster results when sorting/indexing 20.000.000 randon rows. However, it would be nice to see the results of concurrent sorts. Hans -- *Cybertec Geschwinde u Schoenig* Ludo-Hartmannplatz 1/14, A-1160 Vienna, Austria Tel: +43/1/913 68 09; +43/664/233 90 75 www.postgresql.at http://www.postgresql.at, cluster.postgresql.at http://cluster.postgresql.at, www.cybertec.at http://www.cybertec.at, kernel.cybertec.at http://kernel.cybertec.at ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Changing the default configuration (was Re: [pgsql-advocacy]
Tom Lane writes: We could retarget to try to stay under SHMMAX=4M, which I think is the next boundary that's significant in terms of real-world platforms (isn't that the default SHMMAX on some BSDen?). That would allow us 350 or so shared_buffers, which is better, but still not really a serious choice for production work. What is a serious choice for production work? And what is the ideal choice? The answer probably involves some variables, but maybe we should get values for those variables in each case and work from there. -- Peter Eisentraut [EMAIL PROTECTED] ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Changing the default configuration (was Re: [pgsql-advocacy]
Tom, Justin, What I would really like to do is set the default shared_buffers to 1000. That would be 8 meg worth of shared buffer space. Coupled with more-realistic settings for FSM size, we'd probably be talking a shared memory request approaching 16 meg. This is not enough RAM to bother any modern machine from a performance standpoint, but there are probably quite a few platforms out there that would need an increase in their stock SHMMAX kernel setting before they'd take it. What if we supplied several sample .conf files, and let the user choose which to copy into the database directory? We could have a high read performance profile, and a transaction database profile, and a workstation profile, and a low impact profile. We could even supply a Perl script that would adjust SHMMAX and SHMMALL on platforms where this can be done from the command line. -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Changing the default configuration (was Re: [pgsql-advocacy]
Josh Berkus [EMAIL PROTECTED] writes: What if we supplied several sample .conf files, and let the user choose which to copy into the database directory? We could have a high read performance profile, and a transaction database profile, and a workstation profile, and a low impact profile. Uh ... do we have a basis for recommending any particular sets of parameters for these different scenarios? This could be a good idea in the abstract, but I'm not sure I know enough to fill in the details. A lower-tech way to accomplish the same result is to document these alternatives in postgresql.conf comments and encourage people to review that file, as Steve Crawford just suggested. But first we need the raw knowledge. regards, tom lane ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Changing the default configuration (was Re: [pgsql-advocacy]
Josh Berkus wrote: Tom, Justin, snip What if we supplied several sample .conf files, and let the user choose which to copy into the database directory? We could have a high read performance profile, and a transaction database profile, and a workstation profile, and a low impact profile. We could even supply a Perl script that would adjust SHMMAX and SHMMALL on platforms where this can be done from the command line. This might have value as the next step in the process of: a) Are we going to have better defaults? or b) Let's stick with the current approach. If we decide to go with better (changed) defaults, we may also be able to figure out a way of having profiles that could optionally be chosen from. As a longer term thought, it would be nice if the profiles weren't just hard-coded example files, but more of: pg_autotune --setprofile=xxx Or similar utility, and it did all the work. Named profiles being one capability, and other tuning measurements (i.e. cpu costings, disk performance profiles, etc) being the others. Regards and best wishes, Justin Clift -- My grandfather once told me that there are two kinds of people: those who work and those who take the credit. He told me to try to be in the first group; there was less competition there. - Indira Gandhi ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [HACKERS] Status report: regex replacement
Tatsuo Ishii writes: UTF-8 seems to be the most popular, but even XML standard requires all compliant implementations to deal with at least both UTF-8 and UTF-16. I don't think PostgreSQL is going to natively support UTF-16. At FOSDEM it was claimed that Windows natively uses UCS-2, and there are also continuing rumours that the Java Unicode encoding is not quite UTF-8, so there is going to be a certain pressure to support other Unicode encodings besides UTF-8. As for the names, the SQL standard defines most of those. -- Peter Eisentraut [EMAIL PROTECTED] ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Changing the default configuration (was Re: [pgsql-advocacy]
Tom Lane wrote: snip Uh ... do we have a basis for recommending any particular sets of parameters for these different scenarios? This could be a good idea in the abstract, but I'm not sure I know enough to fill in the details. A lower-tech way to accomplish the same result is to document these alternatives in postgresql.conf comments and encourage people to review that file, as Steve Crawford just suggested. But first we need the raw knowledge. Without too much hacking around, you could pretty easily adapt the pg_autotune code to do proper profiles of a system with different settings. i.e. increment one setting at a time, run pgbench on it with some decent amount of transactions and users, stuff the results into a different database. Aggregate data over time kind of thing. Let it run for a week, etc. If it's helpful, there's a 100% spare Althon 1.6Ghz box around with (choose your OS) + Adaptec 29160 + 512MB RAM + 2 x 9GB Seagate Cheetah 10k rpm drives hanging around. No stress to set that up and let it run any long terms tests you'd like plus send back results. Regards and best wishes, Justin Clift regards, tom lane -- My grandfather once told me that there are two kinds of people: those who work and those who take the credit. He told me to try to be in the first group; there was less competition there. - Indira Gandhi ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Changing the default configuration (was Re: [pgsql-advocacy]
Tom, Justin, Uh ... do we have a basis for recommending any particular sets of parameters for these different scenarios? This could be a good idea in the abstract, but I'm not sure I know enough to fill in the details. Sure. Mostly-Read database, few users, good hardware, complex queries: = High shared buffers and sort mem, high geqo and join collapse thresholds, moderate fsm settings, defaults for WAL. Same as above with many users and simple queries (webserver) = same as above, except lower sort mem and higher connection limit High-Transaction Database = Moderate shared buffers and sort mem, high FSM settings, increase WAL files and buffers. Workstation = Moderate to low shared buffers and sort mem, moderate FSM, defaults for WAL, etc. Low-Impact server = current defaults, more or less. While none of these settings will be *perfect* for anyone, they will be considerably better than what's shipping with postgresql. And, based on my Learning Perl knowledge, I'm pretty sure I could write the program. All we'd need to do is argue out, on the PERFORMANCE list, what's a good value for each profile. That's the tough part. The Perl script is easy. A lower-tech way to accomplish the same result is to document these alternatives in postgresql.conf comments and encourage people to review that file, as Steve Crawford just suggested. But first we need the raw knowledge. That's also not a bad approach ... the CONF file should be more heavily commented, period, regardless of what approach we take. I volunteer to work on this with other participants. Without too much hacking around, you could pretty easily adapt the pg_autotune code to do proper profiles of a system with different settings. No offense, Justin, but I don't know anyone else who's gotten your pg_autotune script to run other than you. And pg_bench has not been useful performance measure for any real database server I have worked on so far. I'd be glad to help improve pg_autotune, with two caveats: 1) We will still need to figure out the profiles above so that we have decent starting values. 2) I suggest that we do pg_autotune in Perl or Python or another higher-level language. This would enable several performance buffs who don't do C to contribute to it, and a performance-tuning script is a higher-level-language sort of function, anyway. -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Changing the default configuration (was Re: [pgsql-advocacy]
Justin Clift [EMAIL PROTECTED] writes: Tom Lane wrote: Uh ... do we have a basis for recommending any particular sets of parameters for these different scenarios? This could be a good idea in the abstract, but I'm not sure I know enough to fill in the details. Without too much hacking around, you could pretty easily adapt the pg_autotune code to do proper profiles of a system with different settings. i.e. increment one setting at a time, run pgbench on it with some decent amount of transactions and users, stuff the results into a different database. If I thought that pgbench was representative of anything, or even capable of reliably producing repeatable numbers, then I might subscribe to results derived this way. But I have little or no confidence in pgbench. Certainly I don't see how you'd use it to produce recommendations for a range of application scenarios, when it's only one very narrow scenario itself. regards, tom lane ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] 7.2 - 7.3 incompatibility
On Mon, 2003-02-10 at 19:13, Peter Eisentraut wrote: Christopher Kings-Lynne writes: I found an example of a casting problem in our source code now that we're running 7.3: SELECT CURRENT_DATE - EXTRACT(DOW FROM CURRENT_DATE); A mathematically sound way to write this would be: select current_date - extract(dow from current_data) * interval '1 day'; I'm not sure really why DOW needs to be double precision, but hey... Extract returns double precision. It can't morph itself based on the argument type at run time. Is this mandates by ANSI ? PostgreSQL _does_ select function based on argument type hannu=# create function f(int) returns int as 'select 1' language 'sql'; CREATE hannu=# create function f(int,int) returns float as 'select 3.1415927' language 'sql'; CREATE hannu=# select f(1),f(1,1); f | f ---+--- 1 | 3.1415927 (1 row) -- Hannu Krosing [EMAIL PROTECTED] ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] pgbash-7.3 released
On Tue, 2003-02-11 at 08:46, SAKAIDA Masaaki wrote: I'm pleased to announce the release of pgbash-7.3. http://www.psn.co.jp/PostgreSQL/pgbash/index-e.html How do the non-PostgreSQL features of pgbash relate to standard bash? Do you also keep up to date with new releases of bash? or is there no connection? -- Oliver Elphick[EMAIL PROTECTED] Isle of Wight, UK http://www.lfix.co.uk/oliver GPG: 1024D/3E1D0C1C: CA12 09E0 E8D5 8870 5839 932A 614D 4C34 3E1D 0C1C Make a joyful noise unto the LORD, all ye lands. Serve the LORD with gladness; come before his presence with singing. Know ye that the LORD he is God; it is he that hath made us, and not we ourselves; we are his people, and the sheep of his pasture. Psalms 100:1-3 ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] pgbash-7.3 released
Oliver Elphick [EMAIL PROTECTED] wrote: On Tue, 2003-02-11 at 08:46, SAKAIDA Masaaki wrote: I'm pleased to announce the release of pgbash-7.3. http://www.psn.co.jp/PostgreSQL/pgbash/index-e.html How do the non-PostgreSQL features of pgbash relate to standard bash? Please see: http://www.psn.co.jp/PostgreSQL/pgbash/usage/usage11-e.html Do you also keep up to date with new releases of bash? Yes, I would like to do. Now, new release of bash is 'bash-2.05b'. And, I have already made a patch for bash-2.05b. But, 'bash-2.05b' seems to have a bug about saving array shell variables. So, pgbash-7.3 could not release the bash-2.05b patch. -- SAKAIDA Masaaki (Sorry, I am not good at English.) ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] Changing the default configuration (was Re: [pgsql-advocacy]
On Tue, 2003-02-11 at 13:01, Tom Lane wrote: Jon Griffin [EMAIL PROTECTED] writes: So it appears that linux at least is way above your 8 meg point, unless I am missing something. Yeah, AFAIK all recent Linuxen are well above the range of parameters that I was suggesting (and even if they weren't, Linux is particularly easy to change the SHMMAX setting on). It's other Unixoid platforms that are likely to have a problem. Particularly the ones where you have to rebuild the kernel to change SHMMAX; people may be afraid to do that. The issue as I see it is: Better performing vs. More Compatible Out of the box Defaults. Perhaps a compromise (hack?): Set the default to some default value that performs well, a value we all agree is not too big (16M? 32M?). On startup, if the OS can't give us what we want, instead of failing, we can try again with a smaller amount, perhaps half the default, if that fails try again with half until we reach some bottom threshold (1M?). The argument against this might be: When I set shared_buffers=X, I want X shared buffers. I don't want it to fail silently and give me less than what I need / want. To address this we might want to add a guc option that controls this behavior. So we ship postgresql.conf with 32M of shared memory and auto_shared_mem_reduction = true. With a comment that the administrator might want to turn this off for production. Thoughts? I think this will allow most uninformed users get decent performing defaults as most systems will accommodate this larger value. ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: Changing the default configuration (was Re: [pgsql-advocacy] [HACKERS] PostgreSQL Benchmarks)
Matthew T. O'Connor [EMAIL PROTECTED] writes: ... So we ship postgresql.conf with 32M of shared memory and auto_shared_mem_reduction = true. With a comment that the administrator might want to turn this off for production. This really doesn't address Justin's point about clueless benchmarkers, however. In fact I fear it would make that problem worse: if Joe Blow says he got horrible performance, who knows whether he was running with a reasonable number of buffers or not? Especially when you ask him did you have lots of shared buffers and he responds yes, of course, it says 32M right here. We've recently been moving away from the notion that it's okay to silently lose functionality in order to run on a given system. For example, if you want to install without readline, you now have to explicitly tell configure that, because we heard why don't I have history in psql way too often from people who just ran configure and paid no attention to what it told them. I think that what this discussion is really leading up to is that we are going to decide to apply the same principle to performance. The out-of-the-box settings ought to give reasonable performance, and if your system can't handle it, you should have to take explicit action to acknowledge the fact that you aren't going to get reasonable performance. regards, tom lane ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])