[HACKERS] pgbash-7.3 released

2003-02-11 Thread SAKAIDA Masaaki

I'm pleased to announce the release of pgbash-7.3.
http://www.psn.co.jp/PostgreSQL/pgbash/index-e.html

ChangeLog:
1. Fix a bug of CVS form in the (pgbash original)copy command. 
2. Update 'pgbashrc' for PostgreSQL-7.3. 
3. Pgbash version number was changed into the same number as PostgreSQL.


--
SAKAIDA Masaaki 



---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] PGP signing releases

2003-02-11 Thread Greg Copeland
Well said.   I'm glad someone else is willing to take a stab at
addressing these issues, since I've been down with the flu.  Thanks
Greg.

As both Gregs have pointed out, hashes and checksums alone should only
be used as an integrity check.  It is not a viable security mechanism. 
A hash does not provide for authentication and even more importantly,
verification of authentication.  These concepts are key to creating a
secure environment.


Regards,

-- 
Greg Copeland [EMAIL PROTECTED]
Copeland Computer Consulting

On Mon, 2003-02-10 at 21:57, [EMAIL PROTECTED] wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 
  So you put the MD5 sum into the release announcement email.  That is
  downloaded by many people and also archived in many distributed places
  that we don't control, so it would be very hard to tamper with.  
  ISTM that this gives you the same result as a PGP signature but with 
  much less administrative overhead.
 
 Not the same results. For one thing, the mailing announcement may be 
 archived on google, but asking people to search google for an MD5 sum 
 as they download the tarball is hardly feasible. Second, it still does 
 not prevent someone from breaking into the server and replacing the 
 tarball with their own version, and their own MD5 checksum. Or maybe 
 just one of the mirrors. Users are not going to know to compare that 
 MD5 with versions on the web somewhere. Third, is does not allow a 
 positive history to be built up due to signing many releases over time. 
 With PGP, someone can be assured that the 9.1 tarball they just 
 downloaded was signed by the same key that signed the 7.3 tarball 
 they've been using for 2 years. Fourth, only with PGP can you trace 
 your key to the one that signed the tarball, an additional level of 
 security. MD5 provides an integrity check only. Any security it 
 affords (such as storing the MD5 sum elsewhere) is trivial and 
 should not be considered when using PGP is standard, easy to implement,
 and has none of MD5s weaknesses.
 
 - --
 Greg Sabino Mullane  [EMAIL PROTECTED]
 PGP Key: 0x14964AC8 200302102250
 -BEGIN PGP SIGNATURE-
 Comment: http://www.turnstep.com/pgp.html
 
 iD8DBQE+SA4AvJuQZxSWSsgRAhenAKDu0vlUBC5Eodyt2OxTG6el++BJZACguR2i
 GGLAzhtA7Tt9w4RUYXY4g2U=
 =3ryu
 -END PGP SIGNATURE-
 
 
 
 
 ---(end of broadcast)---
 TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly



---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



[HACKERS] PostgreSQL Benchmarks

2003-02-11 Thread Christopher Kings-Lynne
Hrm.  I just saw that the PHP ADODB guy just published a bunch of database
benchmarks.  It's fairly evident to me that benchmarking PostgreSQL on
Win32 isn't really fair:

http://php.weblogs.com/oracle_mysql_performance

*sigh*

Chris



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] PostgreSQL Benchmarks

2003-02-11 Thread Mario Weilguni
Hrm.  I just saw that the PHP ADODB guy just published a bunch of database
benchmarks.  It's fairly evident to me that benchmarking PostgreSQL on
Win32 isn't really fair:

http://php.weblogs.com/oracle_mysql_performance

And why is the highly advocated transaction capable MySQL 4 not tested?
That's the problem, for every performance test they choose ISAM tables, and
when transactions are mentioned it's said MySQL has transactions. But why
no benchmarks?

Regards,
Mario Weilguni

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] log_duration

2003-02-11 Thread Greg Stark
Tom Lane [EMAIL PROTECTED] writes:

 Christopher Kings-Lynne [EMAIL PROTECTED] writes:
  Looking at the log_duration postgresql.conf option.  How about adding an
  option log_duration_min which is a value in milliseconds that is the minimum
  time a query must run for before being logged.
 
 Fine with me --- but you'll need to add more logic than that.  Right
 now, log_duration *only* causes the query duration to be printed out;
 if you ain't got log_statement on, you're in the dark as to what the
 query itself was.  You'll need to add some code to print the query
 (the log_min_error_statement logic might be a useful source of
 inspiration).  Not sure how this should interact with the case where
 log_duration is set and the min-duration isn't.  But maybe that case
 is silly, and we should just redefine log_duration as a minimum runtime
 that causes the query *and* its runtime to be printed to the log.

Is it even guaranteed to be properly ordered on a busy server with multiple
processors anyways?

One option is to have log_query output an identifier with the query such as a
hash of the query or the pointer value for the plan, suppressing duplicates.
Then log_duration prints the identifier with the duration. 

This means on a busy server running lots of prepared queries you would see a
whole bunch of queries on startup, then hopefully no durations. Any durations
printed could cause alarms to go off. To find the query you grep the logs for
the identifier in the duration message.

This only really works if you're using prepared queries everywhere. But in the
long run that will be the case for OLTP systems, which is where log_duration
is really useful.

--
greg


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [HACKERS] PostgreSQL Benchmarks

2003-02-11 Thread Greg Copeland
On Tue, 2003-02-11 at 08:26, Christopher Kings-Lynne wrote:
 Hrm.  I just saw that the PHP ADODB guy just published a bunch of database
 benchmarks.  It's fairly evident to me that benchmarking PostgreSQL on
 Win32 isn't really fair:
 
 http://php.weblogs.com/oracle_mysql_performance
 
 *sigh*

How much of the performance difference is from the RDBMS, from the
middleware, and from the quality of implementation in the middleware.

While I'm not surprised that the the cygwin version of PostgreSQL is
slow, those results don't tell me anything about the quality of the
middleware interface between PHP and PostgreSQL.  Does anyone know if we
can rule out some of the performance loss by pinning it to bad
middleware implementation for PostgreSQL?


Regards,

-- 
Greg Copeland [EMAIL PROTECTED]
Copeland Computer Consulting


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [HACKERS] PostgreSQL Benchmarks

2003-02-11 Thread Greg Copeland
On Tue, 2003-02-11 at 08:31, Mario Weilguni wrote:
 Hrm.  I just saw that the PHP ADODB guy just published a bunch of database
 benchmarks.  It's fairly evident to me that benchmarking PostgreSQL on
 Win32 isn't really fair:
 
 http://php.weblogs.com/oracle_mysql_performance
 
 And why is the highly advocated transaction capable MySQL 4 not tested?
 That's the problem, for every performance test they choose ISAM tables, and
 when transactions are mentioned it's said MySQL has transactions. But why
 no benchmarks?
 


Insert Statement

Not using bind variables (MySQL and Oracle): 
$DB-BeginTrans();



Using bind variables: 
$DB-BeginTrans();


PL/SQL Insert Benchmark
Appears to not initiate a transaction.  I'm assuming this is because
it's implicitly within a transaction?  Oddly enough, I am seeing
explicit commits here.

It appears that the benchmarks are attempting to use transactions,
however, I have no idea if MySQL's HEAP supports them.  For all I know,
transactions are being silently ignored.


Regards,

-- 
Greg Copeland [EMAIL PROTECTED]
Copeland Computer Consulting


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



[HACKERS] Hash grouping, aggregates

2003-02-11 Thread Greg Stark

So one of the items on the TODO list is Add hash for evaluating GROUP BY
aggregates (Tom) 

I'm finding this would benefit a lot of my queries. Most of the time seems to
be going into sorts for group by clauses. I don't know how long it would take
to build a hash of course, but I suspect it would be less than the sort.

Is this something a beginner could figure out? I'm thinking I need a normal
Hash node that builds exactly the same kind of hash as a join, then a HashScan
node that picks all the rows out of the hash.

The neat thing is that hash aggregates would allow grouping on data types that
have = operators but no useful  operator.

(Incidentally, I'm fond of nested loop, I remember when I was a beginner SQL
programmer looking at plans and it was intuitively obvious what it meant. I
suspect for a beginner looking at nestloop it might not be quite so
obvious.)

-- 
greg


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] Hash grouping, aggregates

2003-02-11 Thread Bruno Wolff III
On Tue, Feb 11, 2003 at 09:48:11 -0500,
  Greg Stark [EMAIL PROTECTED] wrote:
 
 So one of the items on the TODO list is Add hash for evaluating GROUP BY
 aggregates (Tom) 
 
 I'm finding this would benefit a lot of my queries. Most of the time seems to
 be going into sorts for group by clauses. I don't know how long it would take
 to build a hash of course, but I suspect it would be less than the sort.
 
 Is this something a beginner could figure out? I'm thinking I need a normal
 Hash node that builds exactly the same kind of hash as a join, then a HashScan
 node that picks all the rows out of the hash.

This is already in 7.4. You could try it out by building from CVS.
From the HISTORY file:
System can use either hash- or sort-based strategy for grouped
aggregation

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org



Re: [HACKERS] Hash grouping, aggregates

2003-02-11 Thread Greg Stark

Bruno Wolff III [EMAIL PROTECTED] writes:

 This is already in 7.4. You could try it out by building from CVS.
 From the HISTORY file:
 System can use either hash- or sort-based strategy for grouped
 aggregation

Ooh, doing that now. Thanks.


-- 
greg


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] Hash grouping, aggregates

2003-02-11 Thread Tom Lane
Greg Stark [EMAIL PROTECTED] writes:
 So one of the items on the TODO list is Add hash for evaluating GROUP BY
 aggregates (Tom) 

It's done in CVS tip ... give it a try.

 The neat thing is that hash aggregates would allow grouping on data types that
 have = operators but no useful  operator.

Hm.  Right now I think that would barf on you, because the parser wants
to find the '' operator to label the grouping column with, even if the
planner later decides not to use it.  It'd take some redesign of the
query data structure (specifically SortClause/GroupClause) to avoid that.

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [pgsql-advocacy] [HACKERS] PostgreSQL Benchmarks

2003-02-11 Thread Merlin Moncure
I've tested all the win32 versions of postgres I can get my hands on
(cygwin and not), and my general feeling is that they have problems with
insert performance with fsync() turned on, probably the fault of the os.
Select performance is not so much affected.

This is easily solved with transactions and other such things.  Also
Postgres benefits from pl just like oracle.

May I make a suggestion that maybe it is time to start thinking about
tuning the default config file, IMHO its just a little bit too
conservative, and its hurting you in benchmarks being run by idiots, but
its still bad publicity.  Any real database admin would know his test
are synthetic and not meaningful without having to look at the #s.

This is irritating me so much that I am going to put together a
benchmark of my own, a real world one, on (publicly available) real
world data.  Mysql is a real dog in a lot of situations.  The FCC
publishes a database of wireless transmitters that has tables with 10
million records in it.  I'll pump that into pg, run some benchmarks,
real world queries, and we'll see who the faster database *really* is.
This is just a publicity issue, that's all.  Its still annoying though.

I'll even run an open challenge to database admin to beat query
performance of postgres in such datasets, complex multi table joins,
etc.  I'll even throw out the whole table locking issue and analyze
single user performance.

Merlin 



_
How much of the performance difference is from the RDBMS, from the
middleware, and from the quality of implementation in the middleware.

While I'm not surprised that the the cygwin version of PostgreSQL is
slow, those results don't tell me anything about the quality of the
middleware interface between PHP and PostgreSQL.  Does anyone know if we
can rule out some of the performance loss by pinning it to bad
middleware implementation for PostgreSQL?


Regards,

-- 
Greg Copeland [EMAIL PROTECTED]
Copeland Computer Consulting




---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [HACKERS] PGP signing releases

2003-02-11 Thread Greg Copeland
On Wed, 2003-02-05 at 18:53, Curt Sampson wrote:
 On Thu, 5 Feb 2003, Greg Copeland wrote:
 
  Who will actually hold the key? Where will it be physically kept?
   
Good question but can usually be addressed.
  
   It can be addressed, but how well? This is another big issue that I
   don't see any plan for that I'm comfortable with..
 
  The reason I was vague is because it depends on the key route.
  Obviously, if each person signs, each person must protect their own
  key.  If there is a central project key, it's simply a matter of
  determining which box is used for signing, etc...while important, it's
  certainly not difficult to address.
 
 It seems to me extremely difficult to address. Unless you are physically
 monitoring someone, how do you prevent someone from copying the key off
 of that machine. At which point anybody with the passphrase can use it
 for anything.
 

This issue doesn't change regardless of the mechanism you pick.  Anyone
that is signing a key must take reasonable measures to ensure the
protection of their key.


  How many people will know the passphrase?
   
As few as possible.  Ideally only two, maybe three core developers.
  
   Um...I'm not sure that this is a relevant question at all. The
   passphrase is not part of the key; it's just used to encrypt the key for
   storage. If you know the passphrase, you can make unlimited copies of
   the key, and these copies can be protected with any passphrases you like,
   or no passphrase, for that matter.
 
  If you're concerned about this to that extent, clearly those people
  should not part of the web of trust nor should they be receiving the
  passphrase nor a copy of the private key.  Remember, trust is a key (pun
  intended) part of a reliable PKI.
 
 In that case, I would trust only one person with the key. Making copies of
 the key for others gives no additional protection (since it takes only one
 person out of the group to sign the release) while it increases the chance
 of key compromise (since there are now more copies of the key kicking around,
 and more people who know the passphrase).
 

Which brings us back to backups.  Should the one person that has the key
be unavailable or dead, who will sign the release?  Furthermore, making
*limited* copies of the key does provide for additional limited
protection in case it's lost for some reason.  This helps mitigate the
use of the revocation key until it's absolutely required.  Also provides
for backup (of key and people).

Basically, you are saying:
You trust a core developer
You trust they can protect their keys
You trust they can properly distribute their trust
You don't trust a core developer with a key

Hmmm...something smells in your web of trust...So, which is it?  Do you
trust the core developers to protect the interests of the project and
the associated key or not?  If not, why trust any digital signature from
them in the first place?

Can't stress this enough.  PKI is an absolute failure without trust. 
Period.


 Keys cannot be transfered from one person to another since, being digital
 data, there's no way to ascertain that the original holder does not still
 (on purpose or inadvertantly) have copies of the key. So in the case where
 we want to transfer trust from one person to another, we must also generate
 a new key and revoke the old one.
 

No one is talking about transferring keys.  In fact, I've previously
addressed this topic, from a different angle, a number of times.  We are
talking about shared trust and not transfered trust.  The transferring
of trust is done by signing keys, not transferring keys.

 This is now exactly equivalant to having each developer sign postgres
 with a signing key (signed by his main key) for which the other
 developers (or appropriate authority) have a revocation certificate.
 
 And back to the passphrase issue, once again, can't you see that it's
 completely irrelevant? At some point, someone who knows the passphrase is
 going to have to be in a position to use that to decrypt the key. At that
 point he has the key, period. Changing the passphrase does no good, because
 you can't change the passphrase on the copy of the key he may have made.
 

So you trust the core developer to sign the package but you don't trust
him to have the key that's required to sign it?  You can't have it both
ways.

 A passphrase is like a lock on your barn door. After you've given
 someone the key and he's gone in and taken the cow, changing the lock
 gives you no protection at all.


I can assure you I fully understand the implications and meaning of
everything I've said.


-- 
Greg Copeland [EMAIL PROTECTED]
Copeland Computer Consulting


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Changing the default configuration (was Re: [pgsql-advocacy] [HACKERS] PostgreSQL Benchmarks)

2003-02-11 Thread Tom Lane
Merlin Moncure [EMAIL PROTECTED] writes:
 May I make a suggestion that maybe it is time to start thinking about
 tuning the default config file, IMHO its just a little bit too
 conservative,

It's a lot too conservative.  I've been thinking for awhile that we
should adjust the defaults.

The original motivation for setting shared_buffers = 64 was so that
Postgres would start out-of-the-box on machines where SHMMAX is 1 meg
(64 buffers = 1/2 meg, leaving 1/2 meg for our other shared data
structures).  At one time SHMMAX=1M was a pretty common stock kernel
setting.  But our other data structures blew past the 1/2 meg mark
some time ago; at default settings the shmem request is now close to
1.5 meg.  So people with SHMMAX=1M have already got to twiddle their
postgresql.conf settings, or preferably learn how to increase SHMMAX.
That means there is *no* defensible reason anymore for defaulting to
64 buffers. 

We could retarget to try to stay under SHMMAX=4M, which I think is
the next boundary that's significant in terms of real-world platforms
(isn't that the default SHMMAX on some BSDen?).  That would allow us
350 or so shared_buffers, which is better, but still not really a
serious choice for production work.

What I would really like to do is set the default shared_buffers to
1000.  That would be 8 meg worth of shared buffer space.  Coupled with
more-realistic settings for FSM size, we'd probably be talking a shared
memory request approaching 16 meg.  This is not enough RAM to bother
any modern machine from a performance standpoint, but there are probably
quite a few platforms out there that would need an increase in their
stock SHMMAX kernel setting before they'd take it.

So what this comes down to is making it harder for people to get
Postgres running for the first time, versus making it more likely that
they'll see decent performance when they do get it running.

It's worth noting that increasing SHMMAX is not nearly as painful as
it was back when these decisions were taken.  Most people have moved
to platforms where it doesn't even take a kernel rebuild, and we've
acquired documentation that tells how to do it on all(?) our supported
platforms.  So I think it might be okay to expect people to do it.

The alternative approach is to leave the settings where they are, and
to try to put more emphasis in the documentation on the fact that the
factory-default settings produce a toy configuration that you *must*
adjust upward for decent performance.  But we've not had a lot of
success spreading that word, I think.  With SHMMMAX too small, you
do at least get a pretty specific error message telling you so.

Comments?

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] Hash grouping, aggregates

2003-02-11 Thread Bruno Wolff III
On Tue, Feb 11, 2003 at 10:41:53 -0500,
  Tom Lane [EMAIL PROTECTED] wrote:
 Greg Stark [EMAIL PROTECTED] writes:
  So one of the items on the TODO list is Add hash for evaluating GROUP BY
  aggregates (Tom) 
 
 It's done in CVS tip ... give it a try.
 
  The neat thing is that hash aggregates would allow grouping on data types that
  have = operators but no useful  operator.
 
 Hm.  Right now I think that would barf on you, because the parser wants
 to find the '' operator to label the grouping column with, even if the
 planner later decides not to use it.  It'd take some redesign of the
 query data structure (specifically SortClause/GroupClause) to avoid that.

I think another issue is that for some = operators you still might not
be able to use a hash. I would expect the discussion for hash joins in
http://developer.postgresql.org/docs/postgres/xoper-optimization.html
would to hash aggregates as well.

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] Hash grouping, aggregates

2003-02-11 Thread Tom Lane
Bruno Wolff III [EMAIL PROTECTED] writes:
   Tom Lane [EMAIL PROTECTED] wrote:
 Greg Stark [EMAIL PROTECTED] writes:
 The neat thing is that hash aggregates would allow grouping on data types that
 have = operators but no useful  operator.
 
 Hm.  Right now I think that would barf on you, because the parser wants
 to find the '' operator to label the grouping column with, even if the
 planner later decides not to use it.  It'd take some redesign of the
 query data structure (specifically SortClause/GroupClause) to avoid that.

 I think another issue is that for some = operators you still might not
 be able to use a hash. I would expect the discussion for hash joins in
 http://developer.postgresql.org/docs/postgres/xoper-optimization.html
 would to hash aggregates as well.

Right, the = operator must be hashable or you're out of luck.  But we
could imagine tweaking the parser to allow GROUP BY if it finds a
hashable = operator and no sort operator.  The only objection I can see
to this is that it means the planner *must* use hash aggregation, which
might be a bad move if there are too many distinct groups.

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: Changing the default configuration (was Re: [pgsql-advocacy] [HACKERS] PostgreSQL Benchmarks)

2003-02-11 Thread Patrick Welche
On Tue, Feb 11, 2003 at 11:20:14AM -0500, Tom Lane wrote:
...
 We could retarget to try to stay under SHMMAX=4M, which I think is
 the next boundary that's significant in terms of real-world platforms
 (isn't that the default SHMMAX on some BSDen?).
...

Assuming 1 page = 4k, and number of pages is correct in GENERIC kernel configs,
SHMMAX=4M for NetBSD (8M for i386, x86_64)

Cheers,

Patrick

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] Changing the default configuration (was Re: [pgsql-advocacy]

2003-02-11 Thread Greg Copeland
On Tue, 2003-02-11 at 10:20, Tom Lane wrote:
 Merlin Moncure [EMAIL PROTECTED] writes:
  May I make a suggestion that maybe it is time to start thinking about
  tuning the default config file, IMHO its just a little bit too
  conservative,
 
 It's a lot too conservative.  I've been thinking for awhile that we
 should adjust the defaults.
 
 The original motivation for setting shared_buffers = 64 was so that
 Postgres would start out-of-the-box on machines where SHMMAX is 1 meg
 (64 buffers = 1/2 meg, leaving 1/2 meg for our other shared data
 structures).  At one time SHMMAX=1M was a pretty common stock kernel
 setting.  But our other data structures blew past the 1/2 meg mark
 some time ago; at default settings the shmem request is now close to
 1.5 meg.  So people with SHMMAX=1M have already got to twiddle their
 postgresql.conf settings, or preferably learn how to increase SHMMAX.
 That means there is *no* defensible reason anymore for defaulting to
 64 buffers. 
 
 We could retarget to try to stay under SHMMAX=4M, which I think is
 the next boundary that's significant in terms of real-world platforms
 (isn't that the default SHMMAX on some BSDen?).  That would allow us
 350 or so shared_buffers, which is better, but still not really a
 serious choice for production work.
 
 What I would really like to do is set the default shared_buffers to
 1000.  That would be 8 meg worth of shared buffer space.  Coupled with
 more-realistic settings for FSM size, we'd probably be talking a shared
 memory request approaching 16 meg.  This is not enough RAM to bother
 any modern machine from a performance standpoint, but there are probably
 quite a few platforms out there that would need an increase in their
 stock SHMMAX kernel setting before they'd take it.
 
 So what this comes down to is making it harder for people to get
 Postgres running for the first time, versus making it more likely that
 they'll see decent performance when they do get it running.
 
 It's worth noting that increasing SHMMAX is not nearly as painful as
 it was back when these decisions were taken.  Most people have moved
 to platforms where it doesn't even take a kernel rebuild, and we've
 acquired documentation that tells how to do it on all(?) our supported
 platforms.  So I think it might be okay to expect people to do it.
 
 The alternative approach is to leave the settings where they are, and
 to try to put more emphasis in the documentation on the fact that the
 factory-default settings produce a toy configuration that you *must*
 adjust upward for decent performance.  But we've not had a lot of
 success spreading that word, I think.  With SHMMMAX too small, you
 do at least get a pretty specific error message telling you so.
 
 Comments?

I'd personally rather have people stumble trying to get PostgreSQL
running, up front, rather than allowing the lowest common denominator
more easily run PostgreSQL only to be disappointed with it and move on.

After it's all said and done, I would rather someone simply say, it's
beyond my skill set, and attempt to get help or walk away.  That seems
better than them being able to run it and say, it's a dog, spreading
word-of-mouth as such after they left PostgreSQL behind.  Worse yet,
those that do walk away and claim it performs horribly are probably
doing more harm to the PostgreSQL community than expecting someone to be
able to install software ever can.

Nutshell:
Easy to install but is horribly slow.

or

Took a couple of minutes to configure and it rocks!



Seems fairly cut-n-dry to me.  ;)


Regards,

-- 
Greg Copeland [EMAIL PROTECTED]
Copeland Computer Consulting


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] Changing the default configuration (was Re: [pgsql-advocacy]

2003-02-11 Thread mlw






Tom Lane wrote:

  "Merlin Moncure" [EMAIL PROTECTED] writes:
  
  
May I make a suggestion that maybe it is time to start thinking about
tuning the default config file, IMHO its just a little bit too
conservative,

  
  
It's a lot too conservative.  I've been thinking for awhile that we
should adjust the defaults.

  

One of the things I did on my Windows install was to have a number of default
configuration files, postgresql.conf.small, postgresql.conf.medium, postgresql.conf.large.

Rather than choose one, in the "initdb" script, ask for or determine the
mount of shared memory, memory, etc.

Another pet peeve I have is forcing the configuration files to be in the
database directory. We had this argument in 7.1 days, and I submitted a patch
that allowed a configuration file to be specified as a command line parameter.
One of the things that Oracle does better is separating the "configuration"
from the data. 

It is an easy patch to allow PostgreSQL to use a separate configuration directory,
and specify the data directory within the configuration file (The way any
logical application works), and, NO, symlinks are not a solution, they are
a kludge.




Re: [HACKERS] Changing the default configuration (was Re: [pgsql-advocacy]

2003-02-11 Thread Justin Clift
Tom Lane wrote:
snip

What I would really like to do is set the default shared_buffers to
1000.  That would be 8 meg worth of shared buffer space.  Coupled with
more-realistic settings for FSM size, we'd probably be talking a shared
memory request approaching 16 meg.  This is not enough RAM to bother
any modern machine from a performance standpoint, but there are probably
quite a few platforms out there that would need an increase in their
stock SHMMAX kernel setting before they'd take it.

snip

Totally agree with this.  We really, really, really, really need to get 
the default to a point where we have _decent_ default performance.

The alternative approach is to leave the settings where they are, and
to try to put more emphasis in the documentation on the fact that the
factory-default settings produce a toy configuration that you *must*
adjust upward for decent performance.  But we've not had a lot of
success spreading that word, I think.  With SHMMMAX too small, you
do at least get a pretty specific error message telling you so.

Comments?


Yep.

Here's an *unfortunately very common* scenario, that again 
unfortunately, a _seemingly large_ amount of people fall for.

a) Someone decides to benchmark database XYZ vs PostgreSQL vs other 
databases

b) Said benchmarking person knows very little about PostgreSQL, so they 
install the RPM's, packages, or whatever, and it works.  Then they run 
whatever benchmark they've downloaded, or designed, or whatever

c) PostgreSQL, being practically unconfigured, runs at the pace of a 
slow, mostly-disabled snail.

d) Said benchmarking person gets better performance from the other 
databases (also set to their default settings) and thinks PostgreSQL 
has lots of features, and it's free, but it's Too Slow.

Yes, this kind of testing shouldn't even _pretend_ to have any real 
world credibility.

e) Said benchmarking person tells everyone they know, _and_ everyone 
they meet about their results.  Some of them even create nice looking or 
profesional looking web pages about it.

f) People who know even _less_ than the benchmarking person hear about 
the test, or read the result, and don't know any better than to believe 
it at face value.  So, they install whatever system was recommended.

g) Over time, the benchmarking person gets the hang of their chosen 
database more and writes further articles about it, and doesn't 
generally look any further afield than it for say... a couple of years. 
 By this time, they've already influenced a couple of thousand people 
in the non-optimal direction.

h) Arrgh.  With better defaults, our next release would _appear_ to be a 
lot faster to quite a few people, just because they have no idea about 
tuning.

So, as sad as this scenario is, better defaults will probably encourage 
a lot more newbies to get involved, and that'll eventually translate 
into a lot more experienced users, and a few more coders to assist.  ;-)

Personally I'd be a bunch happier if we set the buffers so high that we 
definitely have decent performance, and the people that want to run 
PostgreSQL are forced to make the choice of either:

 1) Adjust their system settings to allow PostgreSQL to run properly, or

 2) Manually adjust the PostgreSQL settings to run memory-constrained

This way, PostgreSQL either runs decently, or they are _aware_ that 
they're limiting it.  That should cut down on the false benchmarks 
(hopefully).

:-)

Regards and best wishes,

Justin Clift

			regards, tom lane



--
My grandfather once told me that there are two kinds of people: those
who work and those who take the credit. He told me to try to be in the
first group; there was less competition there.
   - Indira Gandhi


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: Changing the default configuration (was Re: [pgsql-advocacy] [HACKERS] PostgreSQL Benchmarks)

2003-02-11 Thread Steve Crawford
A quick-'n'-dirty first step would be more comments in postgresql.conf. Most 
of the lines are commented out which would imply use the default but the 
default is not shown. (I realize this has the difficulty of defaults that 
change depending upon how PostgreSQL was configured/compiled but perhaps 
postgresql.conf could be built by the make process based on the configuration 
options.)

If postgresql.conf were commented with recommendations it would probably be 
all I need though perhaps a recommendation to edit that file should be 
displayed at the conclusion of make install.

Cheers,
Steve


On Tuesday 11 February 2003 8:20 am, Tom Lane wrote:
 Merlin Moncure [EMAIL PROTECTED] writes:
  May I make a suggestion that maybe it is time to start thinking about
  tuning the default config file, IMHO its just a little bit too
  conservative,

 It's a lot too conservative.  I've been thinking for awhile that we
 should adjust the defaults.

 The original motivation for setting shared_buffers = 64 was so that
 Postgres would start out-of-the-box on machines where SHMMAX is 1 meg
 (64 buffers = 1/2 meg, leaving 1/2 meg for our other shared data
 structures).  At one time SHMMAX=1M was a pretty common stock kernel
 setting.  But our other data structures blew past the 1/2 meg mark
 some time ago; at default settings the shmem request is now close to
 1.5 meg.  So people with SHMMAX=1M have already got to twiddle their
 postgresql.conf settings, or preferably learn how to increase SHMMAX.
 That means there is *no* defensible reason anymore for defaulting to
 64 buffers.

 We could retarget to try to stay under SHMMAX=4M, which I think is
 the next boundary that's significant in terms of real-world platforms
 (isn't that the default SHMMAX on some BSDen?).  That would allow us
 350 or so shared_buffers, which is better, but still not really a
 serious choice for production work.

 What I would really like to do is set the default shared_buffers to
 1000.  That would be 8 meg worth of shared buffer space.  Coupled with
 more-realistic settings for FSM size, we'd probably be talking a shared
 memory request approaching 16 meg.  This is not enough RAM to bother
 any modern machine from a performance standpoint, but there are probably
 quite a few platforms out there that would need an increase in their
 stock SHMMAX kernel setting before they'd take it.

 So what this comes down to is making it harder for people to get
 Postgres running for the first time, versus making it more likely that
 they'll see decent performance when they do get it running.

 It's worth noting that increasing SHMMAX is not nearly as painful as
 it was back when these decisions were taken.  Most people have moved
 to platforms where it doesn't even take a kernel rebuild, and we've
 acquired documentation that tells how to do it on all(?) our supported
 platforms.  So I think it might be okay to expect people to do it.

 The alternative approach is to leave the settings where they are, and
 to try to put more emphasis in the documentation on the fact that the
 factory-default settings produce a toy configuration that you *must*
 adjust upward for decent performance.  But we've not had a lot of
 success spreading that word, I think.  With SHMMMAX too small, you
 do at least get a pretty specific error message telling you so.

 Comments?

   regards, tom lane

 ---(end of broadcast)---
 TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org



Re: Changing the default configuration (was Re: [pgsql-advocacy] [HACKERS] PostgreSQL Benchmarks)

2003-02-11 Thread Tom Lane
Justin Clift [EMAIL PROTECTED] writes:
 Personally I'd be a bunch happier if we set the buffers so high that we 
 definitely have decent performance, and the people that want to run 
 PostgreSQL are forced to make the choice of either:
   1) Adjust their system settings to allow PostgreSQL to run properly, or
   2) Manually adjust the PostgreSQL settings to run memory-constrained
 This way, PostgreSQL either runs decently, or they are _aware_ that 
 they're limiting it.

Yeah, that is the subtext here.  If you can't increase SHMMAX then you
can always trim the postgresql.conf parameters --- but theoretically,
at least, you should then have a clue that you're running a
badly-configured setup ...

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [HACKERS] Changing the default configuration (was Re: [pgsql-advocacy]

2003-02-11 Thread mlw


Greg Copeland wrote:


 

I'd personally rather have people stumble trying to get PostgreSQL
running, up front, rather than allowing the lowest common denominator
more easily run PostgreSQL only to be disappointed with it and move on.

After it's all said and done, I would rather someone simply say, it's
beyond my skill set, and attempt to get help or walk away.  That seems
better than them being able to run it and say, it's a dog, spreading
word-of-mouth as such after they left PostgreSQL behind.  Worse yet,
those that do walk away and claim it performs horribly are probably
doing more harm to the PostgreSQL community than expecting someone to be
able to install software ever can.

RANT

And that my friends is why PostgreSQL is still relatively obscure.

This attitude sucks. If you want a product to be used, you must put the 
effort into making it usable.

It is a no-brainer to make the default configuration file suitable for 
the majority of users. It is lunacy to create a default configuration 
which provides poor performance for over 90% of the users, but which 
allows the lowest common denominator to work.

A product must not perform poorly out of the box, period. A good product 
manager would choose one of two possible configurations, (a) a high 
speed fairly optimized system from the get-go, or (b) it does not run 
unless you create the configuration file. Option (c) out of the box it 
works like crap, is not an option.

This is why open source gets such a bad reputation. Outright contempt 
for the user who may not know the product as well as those developing 
it. This attitude really sucks and it turns people off. We want people 
to use PostgreSQL, to do that we must make PostgreSQL usable. Usability 
IS important.
/RANT



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html


Re: [HACKERS] Changing the default configuration (was Re: [pgsql-advocacy]

2003-02-11 Thread Rod Taylor
On Tue, 2003-02-11 at 12:10, Steve Crawford wrote:
 A quick-'n'-dirty first step would be more comments in postgresql.conf. Most 

This will not solve the issue with the large number of users who have no
interest in looking at the config file -- but are interested in
publishing their results.

-- 
Rod Taylor [EMAIL PROTECTED]

PGP Key: http://www.rbt.ca/rbtpub.asc



signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] Changing the default configuration (was Re: [pgsql-advocacy]

2003-02-11 Thread Kaare Rasmussen
 What if we supplied several sample .conf files, and let the user choose
 which to copy into the database directory?   We could have a high read

Exactly my first thought when reading the proposal for a setting suited for 
performance tests. 

 performance profile, and a transaction database profile, and a
 workstation profile, and a low impact profile.   We could even supply a

And a .benchmark profile :-)

 Perl script that would adjust SHMMAX and SHMMALL on platforms where this
 can be done from the command line.

Or maybe configuration could be adjusted with ./configure if SHMMAX can be 
determined at that point?

-- 
Kaare Rasmussen--Linux, spil,--Tlf:3816 2582
Kaki Datatshirts, merchandize  Fax:3816 2501
Howitzvej 75   Åben 12.00-18.00Email: [EMAIL PROTECTED]
2000 FrederiksbergLørdag 12.00-16.00   Web:  www.suse.dk

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] Changing the default configuration (was Re:

2003-02-11 Thread Greg Copeland
On Tue, 2003-02-11 at 11:23, mlw wrote:
 Greg Copeland wrote:
 
   
 
 I'd personally rather have people stumble trying to get PostgreSQL
 running, up front, rather than allowing the lowest common denominator
 more easily run PostgreSQL only to be disappointed with it and move on.
 
 After it's all said and done, I would rather someone simply say, it's
 beyond my skill set, and attempt to get help or walk away.  That seems
 better than them being able to run it and say, it's a dog, spreading
 word-of-mouth as such after they left PostgreSQL behind.  Worse yet,
 those that do walk away and claim it performs horribly are probably
 doing more harm to the PostgreSQL community than expecting someone to be
 able to install software ever can.
 
 RANT
 
 And that my friends is why PostgreSQL is still relatively obscure.
 
 This attitude sucks. If you want a product to be used, you must put the 
 effort into making it usable.
 


Ah..okay


 It is a no-brainer to make the default configuration file suitable for 
 the majority of users. It is lunacy to create a default configuration 
 which provides poor performance for over 90% of the users, but which 
 allows the lowest common denominator to work.
 

I think you read something into my email which I did not imply.  I'm
certainly not advocating a default configuration file assuming 512M of
share memory or some such insane value.

Basically, you're arguing that they should keep doing exactly what they
are doing.  It's currently known to be causing problems and propagating
the misconception that PostgreSQL is unable to perform under any
circumstance.  I'm arguing that who cares if 5% of the potential user
base has to learn to properly install software.  Either they'll read and
learn, ask for assistance, or walk away.  All of which are better than
Jonny-come-lately offering up a meaningless benchmark which others are
happy to eat with rather large spoons.


 A product must not perform poorly out of the box, period. A good product 
 manager would choose one of two possible configurations, (a) a high 
 speed fairly optimized system from the get-go, or (b) it does not run 
 unless you create the configuration file. Option (c) out of the box it 
 works like crap, is not an option.
 

That's the problem.  Option (c) is what we currently have.  I'm amazed
that you even have a problem with option (a), as that's what I'm
suggesting.  The problem is, potentially for some minority of users, it
may not run out of the box.  As such, I'm more than happy with this
situation than 90% of the user base being stuck with a crappy default
configuration.

Oddly enough, your option (b) is even worse than what you are ranting at
me about.  Go figure.

 This is why open source gets such a bad reputation. Outright contempt 
 for the user who may not know the product as well as those developing 
 it. This attitude really sucks and it turns people off. We want people 
 to use PostgreSQL, to do that we must make PostgreSQL usable. Usability 
 IS important.
 /RANT


There is no contempt here.  Clearly you've read your own bias into this
thread.  If you go back and re-read my posting, I think it's VERY clear
that it's entirely about usability.


Regards,

-- 
Greg Copeland [EMAIL PROTECTED]
Copeland Computer Consulting


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: Changing the default configuration (was Re: [pgsql-advocacy] [HACKERS] PostgreSQL Benchmarks)

2003-02-11 Thread Jon Griffin
FYI, my stock linux 2.4.19 gentoo kernel has:
kernel.shmall = 2097152
kernel.shmmax = 33554432

sysctl -a

So it appears that linux at least is way above your 8 meg point, unless I
am missing something.



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] Changing the default configuration (was Re: [pgsql-advocacy]

2003-02-11 Thread Tom Lane
mlw [EMAIL PROTECTED] writes:
 This attitude sucks. If you want a product to be used, you must put the 
 effort into making it usable.
 [snip]

AFAICT, you are flaming Greg for recommending the exact same thing you
are recommending.  Please calm down and read again.

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: Changing the default configuration (was Re: [pgsql-advocacy] [HACKERS] PostgreSQL Benchmarks)

2003-02-11 Thread Tom Lane
Jon Griffin [EMAIL PROTECTED] writes:
 So it appears that linux at least is way above your 8 meg point, unless I
 am missing something.

Yeah, AFAIK all recent Linuxen are well above the range of parameters
that I was suggesting (and even if they weren't, Linux is particularly
easy to change the SHMMAX setting on).  It's other Unixoid platforms
that are likely to have a problem.  Particularly the ones where you
have to rebuild the kernel to change SHMMAX; people may be afraid to
do that.

Does anyone know whether cygwin has a setting comparable to SHMMAX,
and if so what is its default value?  How about the upcoming native
Windows port --- any issues there?

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] Changing the default configuration (was Re: [pgsql-advocacy]

2003-02-11 Thread Robert Treat
On Tue, 2003-02-11 at 12:08, Justin Clift wrote:
 b) Said benchmarking person knows very little about PostgreSQL, so they 
 install the RPM's, packages, or whatever, and it works.  Then they run 
 whatever benchmark they've downloaded, or designed, or whatever
 

Out of curiosity, how feasible is it for the rpm/package/deb/exe
maintainers to modify their supplied postgresql.conf settings when
building said distribution?  AFAIK the minimum default SHHMAX setting on
Red Hat 8.0 is 32MB, seems like bumping shared buffers to work with that
amount would be acceptable inside the 8.0 rpm's.

Robert Treat




---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] Changing the default configuration (was Re: [pgsql-advocacy]

2003-02-11 Thread mlw




Apology

After Mark calms down and, in fact, sees that Greg was saying the right thing
after all, chagrin is the only word.

I'm sorry.


Greg Copeland wrote:

  On Tue, 2003-02-11 at 11:23, mlw wrote:
  
  
Greg Copeland wrote:



   

I'd personally rather have people stumble trying to get PostgreSQL
running, up front, rather than allowing the lowest common denominator
more easily run PostgreSQL only to be disappointed with it and move on.

After it's all said and done, I would rather someone simply say, "it's
beyond my skill set", and attempt to get help or walk away.  That seems
better than them being able to run it and say, "it's a dog", spreading
word-of-mouth as such after they left PostgreSQL behind.  Worse yet,
those that do walk away and claim it performs horribly are probably
doing more harm to the PostgreSQL community than expecting someone to be
able to install software ever can.

  

RANT

And that my friends is why PostgreSQL is still relatively obscure.

This attitude sucks. If you want a product to be used, you must put the 
effort into making it usable.


  
  

Ah..okay


  
  
It is a no-brainer to make the default configuration file suitable for 
the majority of users. It is lunacy to create a default configuration 
which provides poor performance for over 90% of the users, but which 
allows the lowest common denominator to work.


  
  
I think you read something into my email which I did not imply.  I'm
certainly not advocating a default configuration file assuming 512M of
share memory or some such insane value.

Basically, you're arguing that they should keep doing exactly what they
are doing.  It's currently known to be causing problems and propagating
the misconception that PostgreSQL is unable to perform under any
circumstance.  I'm arguing that who cares if 5% of the potential user
base has to learn to properly install software.  Either they'll read and
learn, ask for assistance, or walk away.  All of which are better than
Jonny-come-lately offering up a meaningless benchmark which others are
happy to eat with rather large spoons.


  
  
A product must not perform poorly out of the box, period. A good product 
manager would choose one of two possible configurations, (a) a high 
speed fairly optimized system from the get-go, or (b) it does not run 
unless you create the configuration file. Option (c) out of the box it 
works like crap, is not an option.


  
  
That's the problem.  Option (c) is what we currently have.  I'm amazed
that you even have a problem with option (a), as that's what I'm
suggesting.  The problem is, potentially for some minority of users, it
may not run out of the box.  As such, I'm more than happy with this
situation than 90% of the user base being stuck with a crappy default
configuration.

Oddly enough, your option (b) is even worse than what you are ranting at
me about.  Go figure.

  
  
This is why open source gets such a bad reputation. Outright contempt 
for the user who may not know the product as well as those developing 
it. This attitude really sucks and it turns people off. We want people 
to use PostgreSQL, to do that we must make PostgreSQL usable. Usability 
IS important.
/RANT

  
  

There is no contempt here.  Clearly you've read your own bias into this
thread.  If you go back and re-read my posting, I think it's VERY clear
that it's entirely about usability.


Regards,

  






[HACKERS] location of the configuration files

2003-02-11 Thread mlw
The debate on the configuration file sparked a memory of an old patch I 
submitted in 7.1 days.

One of the things I do not like about PostgreSQL is, IMHO, is a 
backwards configuration process. Rather than specify a data directory, 
the administrator should specify a database configuration file. Within 
the configuration file is the location and names of the data directory 
and other information. Most admins want a central location for this 
information.

One of the things that is frustrating to me, is to have to hunt down 
where the data directory is so that I can administrate a DB. It can be 
anywhere, in any directory on any volume. If you had, say, a 
/usr/local/pgsql/admin, then you could have mydb.conf which could then 
be checked in to CVS. A standard location for configuration files is a 
more normal process as the location of the data directory is less so. I 
just don't think the PG data directory should not contain configuration 
information.

The original patch allowed a user to specify the location of the 
postgresql.conf file, rather than assuming it lived in $PGDATA
Also included in that patch, was the ability to specify the location of 
the PGDATA directory as well as the names of the pg_hba.conf and other 
configuration files.

It also allowed the user to use PG as it has always worked, The patch 
was not applied because a better solution was supposedly coming, but 
that was two major revisions ago. I would still like to see this 
functionality. Would anyone else?


---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org


Re: [HACKERS] Changing the default configuration (was Re: [pgsql-advocacy]

2003-02-11 Thread scott.marlowe
My other pet peeve is the default max connections setting.  This should be 
higher if possible, but of course, there's always the possibility of 
running out of file descriptors.

Apache has a default max children of 150, and if using PHP or another 
language that runs as an apache module, it is quite possible to use up all 
the pgsql backend slots before using up all the apache child slots.

Is setting the max connections to something like 200 reasonable, or likely 
to cause too many problems?


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



[HACKERS] Windows SHMMAX (was: Default configuration)

2003-02-11 Thread Merlin Moncure
Does anyone know whether cygwin has a setting comparable to SHMMAX,
and if so what is its default value?  How about the upcoming native
Windows port --- any issues there?

From a pure win32 point of view, a good approach would be to use the
VirtualAlloc() memory allocation functions and set up a paged memory
allocation system.  From a very top down point of view, this is the
method of choice if portability is not an issue.  An abstraction to use
this technique within pg context is probably complex and requires
writing lots of win32 api code, which is obviously not desirable.

Another way of looking at it is memory mapped files.  This probably most
closely resembles unix shared memory and is the de facto standard way
for interprocess memory block sharing.  Sadly, performance will suffer
because you have to rely on the virtual memory system (think: writing to
files) to do a lot of stupid stuff you don't necessarily want or need.
The OS has to guarantee that the memory can be swapped out to file at
any time and therefore mirrors the pagefile to the allocated memory
blocks.

With the C++/C memory malloc/free api, you are supposed to be able to
get some of the benefits of virtual alloc (in particular, setting a
process memory allocation limit), but personal experience did not bear
this out.  However, this api sits directly over the virtual allocation
system and is the most portable.  The application has to guard against
fragmentation and things like that in this case.  In win32, server
thrashing is public enemy #1 for database servers, mostly due to the
virtual allocation system (which is quite fast when used right, btw).

Merlin




---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] location of the configuration files

2003-02-11 Thread Robert Treat
On Tue, 2003-02-11 at 13:44, mlw wrote:
 The debate on the configuration file sparked a memory of an old patch I 
 submitted in 7.1 days.
 
 One of the things I do not like about PostgreSQL is, IMHO, is a 
 backwards configuration process. Rather than specify a data directory, 
 the administrator should specify a database configuration file. Within 
 the configuration file is the location and names of the data directory 
 and other information. Most admins want a central location for this 
 information.
 
 One of the things that is frustrating to me, is to have to hunt down 
 where the data directory is so that I can administrate a DB. It can be 
 anywhere, in any directory on any volume. If you had, say, a 
 /usr/local/pgsql/admin, then you could have mydb.conf which could then 
 be checked in to CVS. A standard location for configuration files is a 
 more normal process as the location of the data directory is less so. I 
 just don't think the PG data directory should not contain configuration 
 information.
 
 The original patch allowed a user to specify the location of the 
 postgresql.conf file, rather than assuming it lived in $PGDATA
 Also included in that patch, was the ability to specify the location of 
 the PGDATA directory as well as the names of the pg_hba.conf and other 
 configuration files.
 
 It also allowed the user to use PG as it has always worked, The patch 
 was not applied because a better solution was supposedly coming, but 
 that was two major revisions ago. I would still like to see this 
 functionality. Would anyone else?
 

I'm going to be lazy and ask if you can post what the better solution
that was coming was (or a link to the thread). While I'll grant you that
the it's coming argument is pretty weak after two releases, that fact
that it may have been a better solution could still hold up.

Robert Treat




---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] Changing the default configuration (was Re: [pgsql-advocacy]

2003-02-11 Thread Tom Lane
scott.marlowe [EMAIL PROTECTED] writes:
 Is setting the max connections to something like 200 reasonable, or likely 
 to cause too many problems?

That would likely run into number-of-semaphores limitations (SEMMNI,
SEMMNS).  We do not seem to have as good documentation about changing
that as we do about changing the SHMMAX setting, so I'm not sure I want
to buy into the it's okay to expect people to fix this before they can
start Postgres the first time argument here.

Also, max-connections doesn't silently skew your testing: if you need
to raise it, you *will* know it.

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org



Re: [HACKERS] location of the configuration files

2003-02-11 Thread mlw

Robert Treat wrote:


I'm going to be lazy and ask if you can post what the better solution
that was coming was (or a link to the thread). While I'll grant you that
the it's coming argument is pretty weak after two releases, that fact
that it may have been a better solution could still hold up.

Robert Treat
 

AFAIK it wasn't actually done. It was more of a, we should do something 
different argument. At one point it was talked about rewriting the 
configuration system to allow include and other things.



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html


Re: [HACKERS] Changing the default configuration (was Re:

2003-02-11 Thread Greg Copeland
On Tue, 2003-02-11 at 12:55, Tom Lane wrote:
 scott.marlowe [EMAIL PROTECTED] writes:
  Is setting the max connections to something like 200 reasonable, or likely 
  to cause too many problems?
 
 That would likely run into number-of-semaphores limitations (SEMMNI,
 SEMMNS).  We do not seem to have as good documentation about changing
 that as we do about changing the SHMMAX setting, so I'm not sure I want
 to buy into the it's okay to expect people to fix this before they can
 start Postgres the first time argument here.
 
 Also, max-connections doesn't silently skew your testing: if you need
 to raise it, you *will* know it.
 

Besides, I'm not sure that it makes sense to let other product needs
dictate the default configurations for this one.  It would be one thing
if the vast majority of people only used PostgreSQL with Apache.  I know
I'm using it in environments in which no way relate to the web.  I'm
thinking I'm not alone.


Regards,

-- 
Greg Copeland [EMAIL PROTECTED]
Copeland Computer Consulting


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] Windows SHMMAX (was: Default configuration)

2003-02-11 Thread Greg Copeland
On Tue, 2003-02-11 at 12:49, Merlin Moncure wrote:
 Does anyone know whether cygwin has a setting comparable to SHMMAX,
 and if so what is its default value?  How about the upcoming native
 Windows port --- any issues there?
 
 From a pure win32 point of view, a good approach would be to use the
 VirtualAlloc() memory allocation functions and set up a paged memory
 allocation system.  From a very top down point of view, this is the
 method of choice if portability is not an issue.  An abstraction to use
 this technique within pg context is probably complex and requires
 writing lots of win32 api code, which is obviously not desirable.
 
 Another way of looking at it is memory mapped files.  This probably most
 closely resembles unix shared memory and is the de facto standard way
 for interprocess memory block sharing.  Sadly, performance will suffer
 because you have to rely on the virtual memory system (think: writing to
 files) to do a lot of stupid stuff you don't necessarily want or need.
 The OS has to guarantee that the memory can be swapped out to file at
 any time and therefore mirrors the pagefile to the allocated memory
 blocks.
 
 With the C++/C memory malloc/free api, you are supposed to be able to
 get some of the benefits of virtual alloc (in particular, setting a
 process memory allocation limit), but personal experience did not bear
 this out.  However, this api sits directly over the virtual allocation
 system and is the most portable.  The application has to guard against
 fragmentation and things like that in this case.  In win32, server
 thrashing is public enemy #1 for database servers, mostly due to the
 virtual allocation system (which is quite fast when used right, btw).


IIRC, there is a mechanism which enables it to be directly
supported/mapped via pagefile.  This is the preferred means of memory
mapped files unless you have a specific need which dictates otherwise. 
Meaning, it allows for many supposed optimizations to be used by the OS
as it is suppose to bypass some of the filesystem overhead.


Regards,

-- 
Greg Copeland [EMAIL PROTECTED]
Copeland Computer Consulting


---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org



Re: [HACKERS] Changing the default configuration (was Re: [pgsql-advocacy]

2003-02-11 Thread Jeff Hoffmann
Tom Lane wrote:


I think that what this discussion is really leading up to is that we
are going to decide to apply the same principle to performance.  The
out-of-the-box settings ought to give reasonable performance, and if
your system can't handle it, you should have to take explicit action
to acknowledge the fact that you aren't going to get reasonable
performance.


What I don't understand is why this is such a huge issue. Set it to a 
reasonable level (be it 4M or whatever the concensus is)  let the 
packagers worry about it if that's not appropriate.  Isn't it their job 
to have a good out-of-the-package experience?  Won't they have better 
knowledge of what the system limits are for the packages they develop 
for?  Worst case, couldn't they have a standard conf package  a special 
high-performance conf package in addition to all the base packages? 
After all, it's the users of the RPMs that are the real problem, not 
usually the people that compile it on their own.  If you were having 
problems with the compile-it-yourself audience, couldn't you just hit 
them over the head three or four times (configure, install, initdb  
failed startup to name a few) reminding them to change it if it wasn't 
appropriate.  What more can you really do?  At some point, the end user 
has to bear some responsibility...

--

Jeff Hoffmann
PropertyKey.com


---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org


Re: [HACKERS] Changing the default configuration (was Re:

2003-02-11 Thread scott.marlowe
On 11 Feb 2003, Greg Copeland wrote:

 On Tue, 2003-02-11 at 12:55, Tom Lane wrote:
  scott.marlowe [EMAIL PROTECTED] writes:
   Is setting the max connections to something like 200 reasonable, or likely 
   to cause too many problems?
  
  That would likely run into number-of-semaphores limitations (SEMMNI,
  SEMMNS).  We do not seem to have as good documentation about changing
  that as we do about changing the SHMMAX setting, so I'm not sure I want
  to buy into the it's okay to expect people to fix this before they can
  start Postgres the first time argument here.
  
  Also, max-connections doesn't silently skew your testing: if you need
  to raise it, you *will* know it.
  
 
 Besides, I'm not sure that it makes sense to let other product needs
 dictate the default configurations for this one.  It would be one thing
 if the vast majority of people only used PostgreSQL with Apache.  I know
 I'm using it in environments in which no way relate to the web.  I'm
 thinking I'm not alone.

True, but even so, 32 max connections is a bit light.  I have more 
pgsql databases than that on my box now.  My point in my previous answer 
to Tom was that you HAVE to shut down postgresql to change this.  It 
doesn't allocate tons of semaphores on startup, just when the child 
processes are spawned, and I'd rather have the user adjust their OS to 
meet the higher need than have to shut down and restart postgresql as 
well.  This is one of the settings that make it feel like a toy when you 
first open it.

How many other high quality databases in the whole world restrict max 
connections to 32?  The original choice of 32 was set because the original 
choice of 64 shared memory blocks as the most we could hope for on common 
OS installs.  Now that we're looking at cranking that up to 1000, 
shouldn't max connections get a look too?

You don't have to be using apache to need more than 32 simo connections.  
Heck, how many postgresql databases do you figure are in production with 
that setting still in there?  My guess is not many.

I'm not saying we should do this to make benchmarks better either, I'm 
saying we should do it to improve the user experience.  A limit of 32 
connects makes things tough for a beginning DBA, not only does he find out 
the problem while his database is under load the first time, but then he 
can't fix it without shutting down and restarting postgresql.  If the max 
is set to 200 or 500 and he starts running out of semaphores, that's a 
problem he can address while his database is still up and running in most 
operating systems, at least in the ones I use.

So, my main point is that any setting that requires you to shut down 
postgresql to make the change, we should pick a compromise value that 
means you never likely will have to shut down the database once you've 
started it up and it's under load.  shared buffers, max connects, etc... 
should not need tweaking for 95% or more of the users if we can help it.  
It would be nice if we could find a set of numbers that reduce the number 
of problems users have, so all I'm doing is looking for the sweetspot, 
which is NOT 32 max connections.


---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org



FW: [HACKERS] Changing the default configuration (was Re:

2003-02-11 Thread Merlin Moncure
True, but even so, 32 max connections is a bit light.  I have more 
pgsql databases than that on my box now.  My point in my previous answer

to Tom was that you HAVE to shut down postgresql to change this.  It 
doesn't allocate tons of semaphores on startup, 
[snip]

is this correct?  I recall looking through the source and seeing
comments to the affect that it is better to allocate them all
(semaphores) up front in order to prevent runtime failed allocations.
(could be totally off base on this).

You don't have to be using apache to need more than 32 simo connections.

Heck, how many postgresql databases do you figure are in production with

that setting still in there?  My guess is not many.

[snip]
True, and it is not unheard of to put minimum specs for version x of the
database, i.e. 7.4 requires kernel 2.x and so on.

Here's the comment I was referring to:

/*
 * InitProcGlobal -
 *initializes the global process table. We put it here so that
 *the postmaster can do this initialization.
 *
 *We also create all the per-process semaphores we will need to
support
 *the requested number of backends.  We used to allocate
semaphores
 *only when backends were actually started up, but that is bad
because
 *it lets Postgres fail under load --- a lot of Unix systems are
 *(mis)configured with small limits on the number of semaphores,
and
 *running out when trying to start another backend is a common
failure.
 *So, now we grab enough semaphores to support the desired max
number
 *of backends immediately at initialization --- if the sysadmin
has set
 *MaxBackends higher than his kernel will support, he'll find
out sooner
 *rather than later.
 *
 *Another reason for creating semaphores here is that the
semaphore
 *implementation typically requires us to create semaphores in
the
 *postmaster, not in backends.
 */

Merlin

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] new procedural language - PL/R

2003-02-11 Thread Joe Conway
Tom Lane wrote:

Joe Conway [EMAIL PROTECTED] writes:

2) Knowing the trend to move stuff *out* of the PostgreSQL source tarball, and 
assuming plr is released under GPL, is there any chance that it would be 
accepted into src/pl or contrib, or should I start a gborg project (I'd prefer 
if it could at least live in contrib)?

I think we'd have to insist on gborg.  The only reason there are any
non-BSD-license items left in contrib is that I haven't finished my TODO
item to get their licenses changed or remove 'em.


[...snip...]


BSD would be good.  I agree that it'll be a pain in the neck to
maintain a PL that is not in the main tree, so I'd support accepting it
if we can get the license right.


I finally got a response from one of the core R developers: libR is 
GPL-ed, and that is unlikely to change -- so I guess gborg it is :-(
(not that I have anything against gborg ;-))

Before making any release announcements, I'd be interested in feedback 
if anyone feels so inclined. The source is currently available here:
  http://www.joeconway.com/plr/plr.0.1.1.alpha.tar.gz

The documentation, including preprocessed html, is in the tar ball. I've 
also posted the html docs here:
  http://www.joeconway.com/plr/index.html

From the README (more or less):
---
Installation:
  Place tar file in 'contrib' in the PostgreSQL source tree and untar.
  Then run:

make
make install
make installcheck

  You can use plr.sql to create the language and functions in your
  database of choice, e.g.

psql mydatabase  plr.sql
---

In addition to the documentation, the plr.out file in plr/expected is a 
good source of usage examples.

PL/R should build cleanly with PostgreSQL 7.3.x and cvs HEAD. It was 
developed using libR from R 1.6.2 under Red Hat 7.3  8.0 -- I've not 
tested against other versions of R or different OSes.

Please let me know how it goes.

Thanks,

Joe



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html


[HACKERS] win32 port

2003-02-11 Thread Merlin Moncure








Has a final decision been made if the win32 port is going to
be threaded or not?

Merlin








Re: [HACKERS] Changing the default configuration (was Re: [pgsql-advocacy]

2003-02-11 Thread Lamar Owen
On Tuesday 11 February 2003 13:03, Robert Treat wrote:
 On Tue, 2003-02-11 at 12:08, Justin Clift wrote:
  b) Said benchmarking person knows very little about PostgreSQL, so they
  install the RPM's, packages, or whatever, and it works.  Then they run
  whatever benchmark they've downloaded, or designed, or whatever

 Out of curiosity, how feasible is it for the rpm/package/deb/exe
 maintainers to modify their supplied postgresql.conf settings when
 building said distribution?  AFAIK the minimum default SHHMAX setting on
 Red Hat 8.0 is 32MB, seems like bumping shared buffers to work with that
 amount would be acceptable inside the 8.0 rpm's.

Yes, this is easy to do.  But what is a sane default?  I can patch any file 
I'd like to, but my preference is to patch as little as possible, as I'm 
trying to be generic here.  I can't assume Red Hat 8 in the source RPM, and 
my binaries are to be preferred only if the distributor doesn't have updated 
ones.
-- 
Lamar Owen
WGCR Internet Radio
1 Peter 4:11


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



[HACKERS] Report on Japan, Belgium trip

2003-02-11 Thread Bruce Momjian
I have just returned from a two-week speaking trip to Japan and Belgium.
I spoke at three locations in Tokyo, and at FOSDEM in Brussels.  I have
updated my home page to list all the speeches, including one on
replication and a new marketing one:

http://candle.pha.pa.us/main/writings/computer.html

I had about 20-150 people at each presentation.  A had very few
questions about PostgreSQL vs. MySQL --- seems this question has been
answered in our favor.  I did get lots of questions about PostgreSQL vs.
Oracle, which is the right question.  ;-)

FOSDEM was very successful.  I think we will do it again next year. 
Actually, I would like to expand our geographic coverage.  We have
O'Reilly for North America, and FOSDEM for Europe.  It would be good to
have something in Asia, Australia, and perhaps Russia and Canada. 
Ideally, we can provide 6-12 hours of content as part of an event that
has other presentations that would be of interest to PostgreSQL folks. 
FOSDEM and O'Reilly meet these criteria.  If you know of such an event,
please let me know or contact the event organizers and ask if they want
PostgreSQL as part of their event.

Let me share one question I got often during my trip, Why should I use
PostgreSQL?  In every case, my answer was that you should use
PostgreSQL because we have amassed a group of developers and committed
users that no other company can match.  The open source development
model allows us to improve PostgreSQL at an amazing rate.  If you look
at where we were seven year ago, and where we are now, it is impossible
to overestimate how advanced we will be seven years from now. So, I
said, don't choose PostgreSQL only for where it is now, but for where it
is going.

I am home for ten days, then head to China and Japan for another two
weeks.  Those visits are not to attend conferences, but to meet with
companies using PostgreSQL.  I am taking my ten-year-old son with me.

I will start catching up on my email now.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org



[HACKERS] Maximum Size for Large Object / TOASTed Object

2003-02-11 Thread Paul Ramsey
Hi All,

What is the maximum size a large object can be in PostgreSQL?
What is the maximum size a TOASTed object can be in PostgreSQL?

The PostgreSQL Limitations page says the maximum size of a field is 
1 Gb, but does a large object constitute a field? :)

Thanks,
Paul

--
  __
 /
 | Paul Ramsey
 | Refractions Research
 | Email: [EMAIL PROTECTED]
 | Phone: (250) 885-0632
 \_


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] Maximum Size for Large Object / TOASTed Object

2003-02-11 Thread Nigel J. Andrews
On Tue, 11 Feb 2003, Paul Ramsey wrote:

 Hi All,
 
 What is the maximum size a large object can be in PostgreSQL?
 What is the maximum size a TOASTed object can be in PostgreSQL?
 
 The PostgreSQL Limitations page says the maximum size of a field is 
 1 Gb, but does a large object constitute a field? :)
 
 Thanks,
 Paul

I don't know but large objects are stored in the filesystem so I presume any
limit is going to apply there. A large object isn't a field, the large object
id can, and very probably should, be stored in one though.


-- 
Nigel J. Andrews


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] Changing the default configuration (was Re: [pgsql-advocacy]

2003-02-11 Thread Rick Gigger
 On Tue, 2003-02-11 at 10:20, Tom Lane wrote:
  Merlin Moncure [EMAIL PROTECTED] writes:
   May I make a suggestion that maybe it is time to start thinking about
   tuning the default config file, IMHO its just a little bit too
   conservative,
 
  It's a lot too conservative.  I've been thinking for awhile that we
  should adjust the defaults.
 
  The original motivation for setting shared_buffers = 64 was so that
  Postgres would start out-of-the-box on machines where SHMMAX is 1 meg
  (64 buffers = 1/2 meg, leaving 1/2 meg for our other shared data
  structures).  At one time SHMMAX=1M was a pretty common stock kernel
  setting.  But our other data structures blew past the 1/2 meg mark
  some time ago; at default settings the shmem request is now close to
  1.5 meg.  So people with SHMMAX=1M have already got to twiddle their
  postgresql.conf settings, or preferably learn how to increase SHMMAX.
  That means there is *no* defensible reason anymore for defaulting to
  64 buffers.
 
  We could retarget to try to stay under SHMMAX=4M, which I think is
  the next boundary that's significant in terms of real-world platforms
  (isn't that the default SHMMAX on some BSDen?).  That would allow us
  350 or so shared_buffers, which is better, but still not really a
  serious choice for production work.
 
  What I would really like to do is set the default shared_buffers to
  1000.  That would be 8 meg worth of shared buffer space.  Coupled with
  more-realistic settings for FSM size, we'd probably be talking a shared
  memory request approaching 16 meg.  This is not enough RAM to bother
  any modern machine from a performance standpoint, but there are probably
  quite a few platforms out there that would need an increase in their
  stock SHMMAX kernel setting before they'd take it.
 
  So what this comes down to is making it harder for people to get
  Postgres running for the first time, versus making it more likely that
  they'll see decent performance when they do get it running.
 
  It's worth noting that increasing SHMMAX is not nearly as painful as
  it was back when these decisions were taken.  Most people have moved
  to platforms where it doesn't even take a kernel rebuild, and we've
  acquired documentation that tells how to do it on all(?) our supported
  platforms.  So I think it might be okay to expect people to do it.
 
  The alternative approach is to leave the settings where they are, and
  to try to put more emphasis in the documentation on the fact that the
  factory-default settings produce a toy configuration that you *must*
  adjust upward for decent performance.  But we've not had a lot of
  success spreading that word, I think.  With SHMMMAX too small, you
  do at least get a pretty specific error message telling you so.
 
  Comments?

 I'd personally rather have people stumble trying to get PostgreSQL
 running, up front, rather than allowing the lowest common denominator
 more easily run PostgreSQL only to be disappointed with it and move on.

 After it's all said and done, I would rather someone simply say, it's
 beyond my skill set, and attempt to get help or walk away.  That seems
 better than them being able to run it and say, it's a dog, spreading
 word-of-mouth as such after they left PostgreSQL behind.  Worse yet,
 those that do walk away and claim it performs horribly are probably
 doing more harm to the PostgreSQL community than expecting someone to be
 able to install software ever can.

 Nutshell:
 Easy to install but is horribly slow.

 or

 Took a couple of minutes to configure and it rocks!



 Seems fairly cut-n-dry to me.  ;)

The type of person who can't configure it or doesnt' think to try is
probably not doing a project that requires any serious performance.  As long
as you are running it on decent hardware postgres will run fantastic for
anything but a very heavy load.  I think there may be many people out there
who have little experience but want an RDBMS to manage their data.  Those
people need something very, very easy.  Look at the following that mysql
gets despite how poor of a product it is.  It's very, very easy.  Mysql
works great for many peoples needs but then when they need to do something
real they need to move to a different database entirely.  I think there is a
huge advantage to having a product that can be set up very quickly out of
the box.  Those who need serious performance, hopefully for ther employers
sake, will be more like to take a few minutes to do some quick performance
tuning.

Rick Gigger


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] PGP signing release

2003-02-11 Thread Curt Sampson
On Wed, 11 Feb 2003, Greg Copeland wrote:

 On Wed, 2003-02-05 at 18:53, Curt Sampson wrote:

 [Re: everybody sharing a single key]

 This issue doesn't change regardless of the mechanism you pick.  Anyone
 that is signing a key must take reasonable measures to ensure the
 protection of their key.

Right. Which is why you really want to use separate keys: you can determine
who compromised a key if it is compromised, and you can revoke one without
having to revoke all of them.

Which pretty much inevitably leads you to just having the developers use
their own personal keys to sign the release.

 Basically, you are saying:
   You trust a core developer
   You trust they can protect their keys
   You trust they can properly distribute their trust
   You don't trust a core developer with a key

Not at all. I trust core developers with keys, but I see no reason to
weaken the entire system by sharing keys when it's not necessary. Having
each developer sign the release with his own personal key solves every
problem you've brought up.

cjs
-- 
Curt Sampson  [EMAIL PROTECTED]   +81 90 7737 2974   http://www.netbsd.org
Don't you know, in this new Dark Age, we're all light.  --XTC

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] Changing the default configuration (was Re:

2003-02-11 Thread scott.marlowe
On Tue, 11 Feb 2003, Merlin Moncure wrote:

 Here's the comment I was referring to:
 
 /*
  * InitProcGlobal -
  *  initializes the global process table. We put it here so that
  *  the postmaster can do this initialization.
  *
  *  We also create all the per-process semaphores we will need to
 support
  *  the requested number of backends.  We used to allocate
 semaphores
  *  only when backends were actually started up, but that is bad
 because
  *  it lets Postgres fail under load --- a lot of Unix systems are
  *  (mis)configured with small limits on the number of semaphores,
 and
  *  running out when trying to start another backend is a common
 failure.
  *  So, now we grab enough semaphores to support the desired max
 number
  *  of backends immediately at initialization --- if the sysadmin
 has set
  *  MaxBackends higher than his kernel will support, he'll find
 out sooner
  *  rather than later.
  *
  *  Another reason for creating semaphores here is that the
 semaphore
  *  implementation typically requires us to create semaphores in
 the
  *  postmaster, not in backends.
  */

Interesting.  I was looking at the max number of file handles, but not 
semaphores.  I don't have to adjust the sem settings until I break 2047 
connections, about 10 times what I want to set the default to.  

With max connections set to 200 and buffers set to 1000, I pretty much 
can't run out of system resources on my box, only postgresql resources.

My box running RH72 has about 6500 free file handles out of the default 
8192 left when running 200 simo transactions with pgbench, and the 200 max 
connects setting would account for about 10% of the shared semaphore max 
on the box, well under what I'd worry about.

So, what OSes would have a problem, if any, with boosting something 
like max connects to 200?  What are the breaking points on other OSes?

I just downloaded FreeBSD 5.0 yesterday, so I'll try to get it installed 
and tested as well.



---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [HACKERS] Changing the default configuration (was Re: [pgsql-advocacy]

2003-02-11 Thread Curt Sampson
On Tue, 11 Feb 2003, Tom Lane wrote:

 It's a lot too conservative.  I've been thinking for awhile that we
 should adjust the defaults.

Some of these issues could be made to Just Go Away with some code
changes. For example, using mmap rather than SysV shared memory
would automatically optimize your memory usage, and get rid of the
double-buffering problem as well. If we could find a way to avoid using
semephores proportional to the number of connections we have, then you
wouldn't have to worry about that configuration parameter, either.

In fact, some of this stuff might well improve our portability, too.
For example, mmap is a POSIX standard, whereas shmget is only an X/Open
standard. That makes me suspect that mmap is more widely available on
non-Unix platforms. (But I could be wrong.)

cjs
-- 
Curt Sampson  [EMAIL PROTECTED]   +81 90 7737 2974   http://www.netbsd.org
Don't you know, in this new Dark Age, we're all light.  --XTC

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [HACKERS] PGP signing release

2003-02-11 Thread Greg Copeland
On Tue, 2003-02-11 at 18:27, Curt Sampson wrote:
 On Wed, 11 Feb 2003, Greg Copeland wrote:
 
  On Wed, 2003-02-05 at 18:53, Curt Sampson wrote:
 
  [Re: everybody sharing a single key]
 
  This issue doesn't change regardless of the mechanism you pick.  Anyone
  that is signing a key must take reasonable measures to ensure the
  protection of their key.
 
 Right. Which is why you really want to use separate keys: you can determine
 who compromised a key if it is compromised, and you can revoke one without
 having to revoke all of them.
 
 Which pretty much inevitably leads you to just having the developers use
 their own personal keys to sign the release.
 
  Basically, you are saying:
  You trust a core developer
  You trust they can protect their keys
  You trust they can properly distribute their trust
  You don't trust a core developer with a key
 
 Not at all. I trust core developers with keys, but I see no reason to
 weaken the entire system by sharing keys when it's not necessary. Having
 each developer sign the release with his own personal key solves every
 problem you've brought up.
 
 cjs

You need to keep in mind, I've not been advocating, rather, clarifying. 
The point being, having a shared key between trusted core developers is
hardly an additional risk.  After all, either they can be trusted or
they can't.

At this point, I think we both understand where the other stands. 
Either we agree or agree to disagree.  The next step is for the
developers to adopt which path they prefer to enforce and to ensure they
have the tools and knowledge at hand to support it.

Anyone know if Tom and Bruce know each other well enough to sign each
other's keys outright, via phone, via phone and snail-mail?  That would
put us off to an excellent start.


Regards,

-- 
Greg Copeland [EMAIL PROTECTED]
Copeland Computer Consulting


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] Changing the default configuration (was Re: [pgsql-advocacy]

2003-02-11 Thread scott.marlowe
On Tue, 11 Feb 2003, Rick Gigger wrote:

 The type of person who can't configure it or doesnt' think to try is
 probably not doing a project that requires any serious performance.  As long
 as you are running it on decent hardware postgres will run fantastic for
 anything but a very heavy load.  I think there may be many people out there
 who have little experience but want an RDBMS to manage their data.  Those
 people need something very, very easy.  Look at the following that mysql
 gets despite how poor of a product it is.  It's very, very easy.  Mysql
 works great for many peoples needs but then when they need to do something
 real they need to move to a different database entirely.  I think there is a
 huge advantage to having a product that can be set up very quickly out of
 the box.  Those who need serious performance, hopefully for ther employers
 sake, will be more like to take a few minutes to do some quick performance
 tuning.

Very good point.  I'm pushing for changes that will NOT negatively impact 
joe beginner on the major platforms (Linux, BSD, Windows) in terms of 
install.  I figure anyone installing on big iron already knows enough 
about their OS we don't have to worry about shared buffers being too big 
for that machine.

So, a compromise of faster performance out of the box, with little or no 
negative user impact seems the sweet spot here.

I'm thinking a good knee setting for each one, where not too much memory / 
semaphores / file handles get gobbled up, but the database isn't pokey.

The poor performance of Postgresql in it's current default configuration 
HAS cost us users, trust me, I know a few we've almost lost where I work 
that I converted after some quick tweaking of their database.

In it's stock form Postgresql is very slow at large simple queries, like 
'select * from table1 t1 natural join table2 t2 where t1.field='a'; where 
you get back something like 10,000 rows.  The real bottleneck here is 
sort_mem.  A simple bump up to 8192 or so makes the database much more 
responsive.

If we're looking at changing default settings for 7.4, then we should look 
at changing ALL of them that matter, since we'll have the most time to 
shake out problems if we do them early, and we won't have four or five 
rounds of setting different defaults over time and finding the limitations 
of the HOST OSes one at a time.


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] Changing the default configuration

2003-02-11 Thread Tatsuo Ishii
 My other pet peeve is the default max connections setting.  This should be 
 higher if possible, but of course, there's always the possibility of 
 running out of file descriptors.
 
 Apache has a default max children of 150, and if using PHP or another 
 language that runs as an apache module, it is quite possible to use up all 
 the pgsql backend slots before using up all the apache child slots.
 
 Is setting the max connections to something like 200 reasonable, or likely 
 to cause too many problems?

It likely. First you will ran out kernel file descriptors. This could
be solved by increasing the kernel table or lowering
max_files_per_process, though. Second the total throughput will
rapidly descrease if you don't have enough RAM and many
CPUs. PostgreSQL can not handle many concurrent
connections/transactions effectively. I recommend to employ some kind
of connection pooling software and lower the max connections.
--
Tatsuo Ishii

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [HACKERS] Changing the default configuration (was Re: [pgsql-advocacy]

2003-02-11 Thread scott.marlowe
On Wed, 12 Feb 2003, Curt Sampson wrote:

 On Tue, 11 Feb 2003, Tom Lane wrote:
 
  It's a lot too conservative.  I've been thinking for awhile that we
  should adjust the defaults.
 
 Some of these issues could be made to Just Go Away with some code
 changes. For example, using mmap rather than SysV shared memory
 would automatically optimize your memory usage, and get rid of the
 double-buffering problem as well. If we could find a way to avoid using
 semephores proportional to the number of connections we have, then you
 wouldn't have to worry about that configuration parameter, either.
 
 In fact, some of this stuff might well improve our portability, too.
 For example, mmap is a POSIX standard, whereas shmget is only an X/Open
 standard. That makes me suspect that mmap is more widely available on
 non-Unix platforms. (But I could be wrong.)

I'll vote for mmap.  I use the mm libs with apache/openldap/authldap and 
it is very fast and pretty common nowadays.  It seems quite stable as 
well.


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] Changing the default configuration (was Re: [pgsql-advocacy]

2003-02-11 Thread Bruno Wolff III
On Tue, Feb 11, 2003 at 17:42:06 -0700,
  scott.marlowe [EMAIL PROTECTED] wrote:
 
 The poor performance of Postgresql in it's current default configuration 
 HAS cost us users, trust me, I know a few we've almost lost where I work 
 that I converted after some quick tweaking of their database.

About two years ago I talked some people into trying it at work to
use with IMP/Horde which had been having some corruption problems
while using MySQL (though it wasn't necessarily a problem with MySQL).
I told them to be sure to use 7.1. When they tried it out it couldn't
keep up with the load. I asked the guys what they tried and found out
they couldn't find 7.1 rpms and didn't want to compile from source and
so ended up using 7.0.?. Also as far as I could tell from talking to them,
they didn't do any tuning at all. They weren't interested in taking another
look at it after that. We are still using MySQL with that system today.

One of our DBAs is using it for some trial projects (including one for me)
even though we have a site license for Oracle.

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] Changing the default configuration

2003-02-11 Thread scott.marlowe
On Wed, 12 Feb 2003, Tatsuo Ishii wrote:

  My other pet peeve is the default max connections setting.  This should be 
  higher if possible, but of course, there's always the possibility of 
  running out of file descriptors.
  
  Apache has a default max children of 150, and if using PHP or another 
  language that runs as an apache module, it is quite possible to use up all 
  the pgsql backend slots before using up all the apache child slots.
  
  Is setting the max connections to something like 200 reasonable, or likely 
  to cause too many problems?
 
 It likely. First you will ran out kernel file descriptors. This could
 be solved by increasing the kernel table or lowering
 max_files_per_process, though. Second the total throughput will
 rapidly descrease if you don't have enough RAM and many
 CPUs. PostgreSQL can not handle many concurrent
 connections/transactions effectively. I recommend to employ some kind
 of connection pooling software and lower the max connections.

Don't know if you saw my other message, but increasing max connects to 200 
used about 10% of all my semaphores and about 10% of my file handles.  
That was while running pgbench to create 200 simo sessions.

Keep in mind, on my fairly small intranet database server, I routinely 
have 32 connections, most coming from outside my webserver.  Probably no 
more than 4 or 5 connects at a time come from there.  These are all things 
like Windows boxes with ODBC running access or something similar.  Many of 
the connections are idle 98% of the time, and use little or no real 
resources, even getting swapped out should the server need the spare 
memory (it doesn't :-)  that machine is set to 120 max simos if I remember 
correctly.

while 200 may seem high, 32 definitely seems low.  So, what IS a good 
compromise?  for this and ALL the other settings that should probably be a 
bit higher.  I'm guessing sort_mem or 4 or 8 meg hits the knee for most 
folks, and the max fsm settings tom has suggested make sense.

What wal_sync method should we make default?  Or should we pick one based 
on the OS the user is running?



---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



[HACKERS] Contract Programmer Advice.

2003-02-11 Thread Lamar Owen
Being that this group of hackers is one I trust, and that this is a pretty 
common scenario for contract programming, I thought I'd ask this group a 
question.  I hope you don't mind.

I did some contract programming work for a to-be-nameless company a while 
back, and I'm having difficulty collecting.  The amount of the debt is around 
$1,400 US, so it's not something I am willing to just let go.  However, I've 
been patient for nearly two years: and, despite repeated promises to the 
contrary, I have yet to see this money.  They did pay me for the first $1,000 
or so, but have that $1,400 left to pay (I did a total of about $2,400 for 
them).  It was a subcontracting arrangement; they were contracting out to a 
client, and I was subcontracted.  According to the client, the bill was paid 
in full.

This company doesn't dispute any of my invoices and says they are going to pay 
me.  But they have not yet done so.  This company is still in business, and 
seems to be doing quite well.  Don't even ask their name; I'm not out to 
smear anyone.  Although I am tempted to publicly announce who they are, it is 
against my nature.  But my patience is wearing thin; and I'm willing to try 
just about anything.  Short of libel or slander, of course.  Whether they are 
honest or not doesn't mean I have carte blanche to be dishonest.

I just want my money _soon_.  This company has had no complaints about my 
performance or the quality of my work in this; in fact, in the communications 
I've had with them, they were always pleased with my work (or at least that's 
what they said).  And, of course, I kept copies of all correspondence.  In 
fact, I would do work for them again if I could be more sure of collecting 
the money.  Maybe I'm just too patient; I don't know.

Since I do trust this group, if any of you have had experience dealing with 
credit collection bureaus and the like, can you please e-mail me privately 
with your experience?
-- 
Lamar Owen
WGCR Internet Radio
1 Peter 4:11


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] Changing the default configuration

2003-02-11 Thread Tatsuo Ishii
  It likely. First you will ran out kernel file descriptors. This could
  be solved by increasing the kernel table or lowering
  max_files_per_process, though. Second the total throughput will
  rapidly descrease if you don't have enough RAM and many
  CPUs. PostgreSQL can not handle many concurrent
  connections/transactions effectively. I recommend to employ some kind
  of connection pooling software and lower the max connections.
 
 Don't know if you saw my other message, but increasing max connects to 200 
 used about 10% of all my semaphores and about 10% of my file handles.  
 That was while running pgbench to create 200 simo sessions.

I'm not talking about semaphores. You see the low usage of file
descriptors is just because pgbench uses very few tables.

 Keep in mind, on my fairly small intranet database server, I routinely 
 have 32 connections, most coming from outside my webserver.  Probably no 
 more than 4 or 5 connects at a time come from there.  These are all things 
 like Windows boxes with ODBC running access or something similar.  Many of 
 the connections are idle 98% of the time, and use little or no real 
 resources, even getting swapped out should the server need the spare 
 memory (it doesn't :-)  that machine is set to 120 max simos if I remember 
 correctly.
 
 while 200 may seem high, 32 definitely seems low.  So, what IS a good 
 compromise?  for this and ALL the other settings that should probably be a 
 bit higher.  I'm guessing sort_mem or 4 or 8 meg hits the knee for most 
 folks, and the max fsm settings tom has suggested make sense.

32 is not too low if the kernel file descriptors is not
increased. Beware that running out of the kernel file descriptors is a
serious problem for the entire system, not only for PostgreSQL.

 What wal_sync method should we make default?  Or should we pick one based 
 on the OS the user is running?

It's really depending on the OS or kernel version. I saw open_sync is
best for certain version of Linux kernel, while fdatasync is good for
another version of kernel. I'm not sure, but it could be possible that
the file system type might affect the wal_sync choice.
--
Tatsuo Ishii

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [pgsql-advocacy] [HACKERS] PostgreSQL Benchmarks

2003-02-11 Thread Christopher Kings-Lynne
Why don't we include a postgresql.conf.recommended along with our
postgresql.conf.sample.  That shouldn't be too hard.  We can just jack up
the shared buffers and wal buffers and everything - it doesn't matter if
it's not perfect, but it will at least give people an idea of what needs to
be increased, etc to get good results.

I'm currently benchmarking our new DB server before we put it into
production.  I plan to publish the results from that shortly.

Regards,

Chris

 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED]]On Behalf Of Merlin Moncure
 Sent: Tuesday, 11 February 2003 11:44 PM
 To: Greg Copeland
 Cc: PostgresSQL Hackers Mailing List; [EMAIL PROTECTED]
 Subject: Re: [pgsql-advocacy] [HACKERS] PostgreSQL Benchmarks


 I've tested all the win32 versions of postgres I can get my hands on
 (cygwin and not), and my general feeling is that they have problems with
 insert performance with fsync() turned on, probably the fault of the os.
 Select performance is not so much affected.

 This is easily solved with transactions and other such things.  Also
 Postgres benefits from pl just like oracle.

 May I make a suggestion that maybe it is time to start thinking about
 tuning the default config file, IMHO its just a little bit too
 conservative, and its hurting you in benchmarks being run by idiots, but
 its still bad publicity.  Any real database admin would know his test
 are synthetic and not meaningful without having to look at the #s.

 This is irritating me so much that I am going to put together a
 benchmark of my own, a real world one, on (publicly available) real
 world data.  Mysql is a real dog in a lot of situations.  The FCC
 publishes a database of wireless transmitters that has tables with 10
 million records in it.  I'll pump that into pg, run some benchmarks,
 real world queries, and we'll see who the faster database *really* is.
 This is just a publicity issue, that's all.  Its still annoying though.

 I'll even run an open challenge to database admin to beat query
 performance of postgres in such datasets, complex multi table joins,
 etc.  I'll even throw out the whole table locking issue and analyze
 single user performance.

 Merlin



 _
 How much of the performance difference is from the RDBMS, from the
 middleware, and from the quality of implementation in the middleware.

 While I'm not surprised that the the cygwin version of PostgreSQL is
 slow, those results don't tell me anything about the quality of the
 middleware interface between PHP and PostgreSQL.  Does anyone know if we
 can rule out some of the performance loss by pinning it to bad
 middleware implementation for PostgreSQL?


 Regards,

 --
 Greg Copeland [EMAIL PROTECTED]
 Copeland Computer Consulting




 ---(end of broadcast)---
 TIP 6: Have you searched our list archives?

 http://archives.postgresql.org



---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org



Re: [HACKERS] Changing the default configuration (was Re: [pgsql-advocacy]

2003-02-11 Thread Christopher Kings-Lynne
 After it's all said and done, I would rather someone simply say, it's
 beyond my skill set, and attempt to get help or walk away.  That seems
 better than them being able to run it and say, it's a dog, spreading
 word-of-mouth as such after they left PostgreSQL behind.  Worse yet,
 those that do walk away and claim it performs horribly are probably
 doing more harm to the PostgreSQL community than expecting someone to be
 able to install software ever can.
 
 RANT

 And that my friends is why PostgreSQL is still relatively obscure.

Dude - I hang out on PHPBuilder's database forums and you wouldn't believe
how often the oh, don't use Postgres, it has a history of database
corruption problems thing is mentioned.

Chris



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] Contract Programmer Advice.

2003-02-11 Thread Jeroen T. Vermeulen
On Tue, Feb 11, 2003 at 08:56:56PM -0500, Lamar Owen wrote:
 
 This company doesn't dispute any of my invoices and says they are going to pay 
 me.  But they have not yet done so.  This company is still in business, and 

I wouldn't know about your country, whichever that may be, but in the
Netherlands it only takes two unpaid creditors to request bankruptcy
for a company with a judge.  Bankruptcy is declared when a company no
longer pays its dues (note that ability doesn't come into this), and 
two outstanding debts are the minimum required to establish that.

If all else fails, that kind of procedure gives a very strong incentive
to pay up.  Your country may have a similar arrangement.


Jeroen


---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org



Re: [HACKERS] Contract Programmer Advice.

2003-02-11 Thread Dave Smith
In Canada we have small claims court. up to 10,1000$ and it only costs 
you 50$ to file a claim. They have to file a defense or settle within 30 
days. Usally if they owe you the money it forces them to do something, 
either settle or *really* drag it out, but it gets the process moving.




Jeroen T. Vermeulen wrote:

On Tue, Feb 11, 2003 at 08:56:56PM -0500, Lamar Owen wrote:
 

This company doesn't dispute any of my invoices and says they are going to pay 
me.  But they have not yet done so.  This company is still in business, and 
   


I wouldn't know about your country, whichever that may be, but in the
Netherlands it only takes two unpaid creditors to request bankruptcy
for a company with a judge.  Bankruptcy is declared when a company no
longer pays its dues (note that ability doesn't come into this), and 
two outstanding debts are the minimum required to establish that.

If all else fails, that kind of procedure gives a very strong incentive
to pay up.  Your country may have a similar arrangement.


Jeroen


---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org
 



---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



[HACKERS] Heads up: Telegraph research project at UC Berkeley

2003-02-11 Thread Joe Hellerstein
Hi all:
	I wanted to give the PostgreSQL community a heads up on ongoing 
database research here at UC Berkeley, which may be of interest since 
it uses PGSQL.

The last few years we've been building a system called Telegraph, which 
has a number of research thrusts:

a) aggressively adaptive query optimization (based on the eddy 
concept)

b) queries over external data sources, including pull (e.g. by 
scraping web pages) and push (e.g. data feeds from sensors)

c) support for Continuous Queries over streaming push sources, 
including multi-query optimization and fault-tolerant parallelism

The first version of the system was written from scratch in Java, which 
caused us some pain  About 8 months ago we abandoned our initial 
version of the system, and decided to start over in C.  Rather than 
starting over from scratch, we decided to take PostgreSQL and enhance 
it (and also disembowel it somewhat) for our purposes.  We're planning 
on releasing an alpha of the new system, called TelegraphCQ, in a 
couple weeks.  It will include some but not all of the above features, 
and we'd be happy to have folks kick the tires a little.

It will be interesting to see how/if our modifications can be of use to 
the broader PGSQL community.  I encourage interested folks to have a 
look at our website at http://telegraph.cs.berkeley.edu, read some of 
the papers, etc.  The best overview is our recent CIDR paper, 
http://www.cs.berkeley.edu/~franklin/Papers/TCQcidr03.pdf

(Please see the related note on how we're now using PostgreSQL in our 
database classes both at Berkeley and CMU.)

Regards,

Joe Hellerstein

--

Joseph M. Hellerstein
Professor, EECS Computer Science Division
UC Berkeley
http://www.cs.berkeley.edu/~jmh


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] Berkeley and CMU classes adopt/extend PostgreSQL

2003-02-11 Thread Tom Lane
Joe Hellerstein [EMAIL PROTECTED] writes:
 I am teaching the undergrad DB course at UC Berkeley, something I do 
 with some frequency.  We have the usual 180 students  we get every 
 semester (yep: 180!), but this year we've instituted 2 changes:
 1) We changed the course projects to make the students hack PostgreSQL 
 internals, rather than the minibase eduware

Cool.

 2) We are coordinating the class with a class at CMU being taught by 
 Prof. Anastassia (Natassa) Ailamaki

Double cool.  I'm just down the road, if Natassa needs a visiting
lecturer.

 Our Homework 2, which is being passed out this week, will ask the 
 students to implement a hash-based grouping that spills to disk.  I 
 understand this topic has been batted about the pgsql-hackers list 
 recently.

Yes.  As of CVS tip, we have hash-based grouping but it doesn't spill
to disk.  Want to ask them to start from CVS tip and fix that little
detail?  Or fix the various other loose ends that have been mentioned
lately?  (make it work with DISTINCT, improve the estimation logic,
some other things I'm forgetting)

 I'm hopeful this will lead to many good things:

Yes, let's see what we can do with this ... seems like Postgres may
be coming full circle ;-)

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] Changing the default configuration (was Re:

2003-02-11 Thread Tom Lane
scott.marlowe [EMAIL PROTECTED] writes:
 ... The original choice of 32 was set because the original 
 choice of 64 shared memory blocks as the most we could hope for on common 
 OS installs.  Now that we're looking at cranking that up to 1000, 
 shouldn't max connections get a look too?

Actually I think max-connections at 32 was set because of SEMMAX limits,
and had only the most marginal connection to shared_buffers (anyone care
to troll the archives to check?)  But sure, let's take another look at
the realistic limits today.

 ... If he starts running out of semaphores, that's a 
 problem he can address while his database is still up and running in most 
 operating systems, at least in the ones I use.

Back in the day, this took a kernel rebuild and system reboot to fix.
If this has changed, great ... but on exactly which Unixen can you
alter SEMMAX on the fly?

 So, my main point is that any setting that requires you to shut down 
 postgresql to make the change, we should pick a compromise value that 
 means you never likely will have to shut down the database once you've 
 started it up and it's under load.

When I started using Postgres, it did not allocate the max number of
semas it might need at startup, but was instead prone to fail when you
tried to open the 17th or 33rd or so connection.  It was universally
agreed to be an improvement to refuse to start at all if we could not
meet the specified max_connections setting.  I don't want to backtrack
from that.  If we can up the default max_connections setting, great ...
but let's not increase the odds of failing under load.

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] location of the configuration files

2003-02-11 Thread Curt Sampson

I, personally, also think it makes more sense to pass to the postmaster
a configuration file that contains all the rest of the information about
the database system, including the disk locations of the various data
directories and whatnot.

cjs
-- 
Curt Sampson  [EMAIL PROTECTED]   +81 90 7737 2974   http://www.netbsd.org
Don't you know, in this new Dark Age, we're all light.  --XTC

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



[HACKERS] PostgreSQL Tuning Results

2003-02-11 Thread Christopher Kings-Lynne
Hi Everyone,

I have just completed a basic set of benchmarking on our new database
server.  I wanted to figure out a good value for shared_buffers before we go
live.

We are a busy ecommerce-style website and so we probably get 10 or 20 to 1
read transactions vs. write transactions.  We also don't have particularly
large tables.

Attached are the charts for select only and tpc-b runs.  Also attached is an
OpenOffice.org spreadsheet with all the results, averages and charts.  I
place all these attachments in the public domain, so you guys can use them
how you wish.

I installed pgbench, and set up a pgbench database with scale factor 1.

I then set shared_buffers to all the values between 2000 and 11000 and
tested select and tcp-b with each.  I ran each test 3 times and averaged the
values.  TPC-B was run after select so had advantages due to the buffers
already being filled, but I was consistent with this.

Machine:
256MB RAM, FreeBSD 4.7, EIDE HDD,  1 Ghz

TPC-B config:
pgbench -c 64 -t 100 pgbench (Note: only 64 users here)

SELECT config:
pgbench -c 128 -t 100 -S pgbench (Note: full 128 users here)

I'm not sure why 8000 and 9000 are low on tpc-b, it's odd.

Anyway, from the attached results you can see that 4000 buffers gave the
best SELECT only performance, whereas the TPC-B stuff seemed to max out way
up at 1 or so.  Since there is a 20% gain in performance on TPC-B going
from 4000 buffers to 5000 buffers and only a 2% loss in performance for
SELECTs, I have configured my server to use 5000 shared buffers, eg. 45MB
RAM.

I am now going to leave it on 5000 and play with wal_buffers.  Is there
anything else people are interested in me trying?

Later on, I'll run pg_autotune to see how its recommendation matches my
findings.

Chris

attachment: pg_select.gifattachment: pg_tpcb.gif

PostgreSQL Benchmark.sxc
Description: OpenOffice Calc spreadsheet

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org



Re: [HACKERS] PostgreSQL Tuning Results

2003-02-11 Thread Dann Corbit
 -Original Message-
 From: Christopher Kings-Lynne [mailto:[EMAIL PROTECTED]] 
 Sent: Tuesday, February 11, 2003 8:54 PM
 To: Hackers; Advocacy
 Subject: [HACKERS] PostgreSQL Tuning Results
 
 
 Hi Everyone,
 
 I have just completed a basic set of benchmarking on our new 
 database server.  I wanted to figure out a good value for 
 shared_buffers before we go live.
 
 We are a busy ecommerce-style website and so we probably get 
 10 or 20 to 1 read transactions vs. write transactions.  We 
 also don't have particularly large tables.
 
 Attached are the charts for select only and tpc-b runs.  Also 
 attached is an OpenOffice.org spreadsheet with all the 
 results, averages and charts.  I place all these attachments 
 in the public domain, so you guys can use them how you wish.
 
 I installed pgbench, and set up a pgbench database with scale 
 factor 1.
 
 I then set shared_buffers to all the values between 2000 and 
 11000 and tested select and tcp-b with each.  I ran each test 
 3 times and averaged the values.  TPC-B was run after select 
 so had advantages due to the buffers already being filled, 
 but I was consistent with this.
 
 Machine:
 256MB RAM, FreeBSD 4.7, EIDE HDD,  1 Ghz
 
 TPC-B config:
 pgbench -c 64 -t 100 pgbench (Note: only 64 users here)
 
 SELECT config:
 pgbench -c 128 -t 100 -S pgbench (Note: full 128 users here)
 
 I'm not sure why 8000 and 9000 are low on tpc-b, it's odd.
 
 Anyway, from the attached results you can see that 4000 
 buffers gave the best SELECT only performance, whereas the 
 TPC-B stuff seemed to max out way up at 1 or so.  Since 
 there is a 20% gain in performance on TPC-B going from 4000 
 buffers to 5000 buffers and only a 2% loss in performance for 
 SELECTs, I have configured my server to use 5000 shared 
 buffers, eg. 45MB RAM.
 
 I am now going to leave it on 5000 and play with wal_buffers. 
  Is there anything else people are interested in me trying?

Keenly interested.  Who wouldn't want to know how to optimize it?
That's the hardest guideline to find.
 
 Later on, I'll run pg_autotune to see how its recommendation 
 matches my findings.

I would like to hear about that also.  Please report on it.

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [pgsql-advocacy] [HACKERS] PostgreSQL Tuning Results

2003-02-11 Thread Christopher Kings-Lynne
  I am now going to leave it on 5000 and play with wal_buffers.
   Is there anything else people are interested in me trying?

 Keenly interested.  Who wouldn't want to know how to optimize it?
 That's the hardest guideline to find.

Oops - what that sentence was supposed to say is Is there anyone else
interested in me trying any other variables?

What I don't really know is what is actually affected by wal_buffers?  I
assume my select only tests won't even touch the WAL, so I guess I have to
just play with tpc-b.

Chris


---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org



Re: [HACKERS] PostgreSQL Tuning Results

2003-02-11 Thread Gavin Sherry
Hi Chris,

On Wed, 12 Feb 2003, Christopher Kings-Lynne wrote:

 Machine:
 256MB RAM, FreeBSD 4.7, EIDE HDD,  1 Ghz

Seems like a small amount of memory to be memory based tests with.

What about testing sort_mem as well. It would system to me that there
would be no negative to having infinite sort_mem given infinite memory,
though.

Gavin


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] PostgreSQL Tuning Results

2003-02-11 Thread Christopher Kings-Lynne
  Machine:
  256MB RAM, FreeBSD 4.7, EIDE HDD,  1 Ghz

 Seems like a small amount of memory to be memory based tests with.

Perhaps, but I'm benchmarking for that machine, not for any other.  The
results have to include the 256MB spec.

Also, the peak was 25MB of SHM, which still leave 231MB for the rest of the
system, so surely RAM is not the bottleneck here?

 What about testing sort_mem as well. It would system to me that there
 would be no negative to having infinite sort_mem given infinite memory,
 though.

Yeah, however I'm pretty sure that pgbench doesn't perform any sorts.

I reckon that sort_mem is the hardest thing to optimise1

Chris


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] Projection while performing joins.

2003-02-11 Thread Tom Lane
Anagh Lal [EMAIL PROTECTED] writes:
 ... I am still confused by the following:
 In /backend/executor/nodeMergeJoin.c
 in ExecMergeJoin()
 In the state (the switch case)  EXEC_MJ_JOINTUPLES
 we still do ExecProject(), what does this do?

Well, sure.  A join node *must* do a projection, no?  It can't simply
return either the left or the right input tuple (except in the
vanishingly-small fraction of cases where you don't actually need any
columns from the right or the left respectively; which are cases that
we don't currently bother to optimize).  To create a tuple that's not
exactly one or the other you must project.

 Some food for thought,
 Let's ignore the attributes listed in the select
 clause
 and work only with the where clause (join condition)
 attributes. And as a back reference store the 
 tupleid of the original whole tuple in the working
 tuple. At the final output stage perform a lookup to
 retrieve the select clause attributes of only the
 qualifying tuple. Thus enabling us to work with really
 small sized data. 
 worth trying out?

Not sure there's a lot of traction here.  In many cases, the
bottom-level scan gets advanced one or more rows before the top-level
nodes can pop out a result.  (Consider GROUP BY as an example.)
I don't see the advantage of adding bookkeeping/copying for past
rows in order to avoid copying current-row data around.

But feel free to prove me wrong ;-)

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [pgsql-advocacy] [HACKERS] PostgreSQL Benchmarks

2003-02-11 Thread Shridhar Daithankar[EMAIL PROTECTED]
On Tuesday 11 Feb 2003 8:01 pm, Mario Weilguni wrote:
 Hrm.  I just saw that the PHP ADODB guy just published a bunch of database
 benchmarks.  It's fairly evident to me that benchmarking PostgreSQL on
 Win32 isn't really fair:
 
 http://php.weblogs.com/oracle_mysql_performance

 And why is the highly advocated transaction capable MySQL 4 not tested?
 That's the problem, for every performance test they choose ISAM tables, and
 when transactions are mentioned it's said MySQL has transactions. But why
 no benchmarks?

I did benchmark mysql/postgresql/oracle sometime back. Mysql with transaction 
is 90% as fast as postgresql. But it dies down with increased number of users 
no matter how much resources you throw at it.

Oracle is 130% of postgresql. This was postgresql 7.2.x series so things have 
changed for sure, but you got the idea, right?

 Shridhar

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [HACKERS] Changing the default configuration (was Re: [pgsql-advocacy]

2003-02-11 Thread Shridhar Daithankar[EMAIL PROTECTED]
On Tuesday 11 Feb 2003 10:56 pm, you wrote:
 Josh Berkus [EMAIL PROTECTED] writes:
  What if we supplied several sample .conf files, and let the user choose
  which to copy into the database directory?   We could have a high read
  performance profile, and a transaction database profile, and a
  workstation profile, and a low impact profile.

 Uh ... do we have a basis for recommending any particular sets of
 parameters for these different scenarios?  This could be a good idea
 in the abstract, but I'm not sure I know enough to fill in the details.

Let's take very simple scenario to supply pre-configured postgresql.conf.

Assume that SHMMAX=Total memory/2 and supply different config files for

64MB/128Mb/256MB/512MB and above.

Is it simple enough?

 Shridhar

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] Maximum Size for Large Object / TOASTed Object

2003-02-11 Thread Tom Lane
Paul Ramsey [EMAIL PROTECTED] writes:
 What is the maximum size a large object can be in PostgreSQL?
 What is the maximum size a TOASTed object can be in PostgreSQL?

TOASTed fields max out at 1Gb.  Large objects max out at either 2 or 4 Gb
... I'm not sure whether we consistently use signed or unsigned
arithmetic for them.  (Probably they *should* max out at 4Gb, and if you
find any places that fail over 2Gb, those are bugs for which patches
will be gladly accepted.)

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [pgsql-advocacy] [HACKERS] PostgreSQL Benchmarks

2003-02-11 Thread ow
There's The Open Source Database Benchmark,
http://osdb.sourceforge.net/.

Anyone tried to use it?





__
Do you Yahoo!?
Yahoo! Shopping - Send Flowers for Valentine's Day
http://shopping.yahoo.com

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] Windows SHMMAX (was: Default configuration)

2003-02-11 Thread Curt Sampson
On Tue, 11 Feb 2003, Merlin Moncure wrote:

 Another way of looking at it is memory mapped files.  This probably most
 closely resembles unix shared memory and is the de facto standard way
 for interprocess memory block sharing.  Sadly, performance will suffer
 because you have to rely on the virtual memory system (think: writing to
 files) to do a lot of stupid stuff you don't necessarily want or need.

To the contrary, for the majority of the shared memory usage of
postgres, which is cached file data, the virtual memory system is doing
exactly what you want it to: managing the movement of data between
memory and disk, and caching the more frequently accessed data to reduce
the chances you will actually need to access the disk for it.

For shared memory used only for IPC, typically a VM system treats it no
differently from any other non-shared memory, so if it's doing something
you don't want or need (a proposition I quite heartily disagree with),
it's going to be doing that very every piece of memory your application
allocates and uses, shared or not.

 The OS has to guarantee that the memory can be swapped out to file at
 any time and therefore mirrors the pagefile to the allocated memory
 blocks.

The OS does not need to write the pagefile. On modern Unix systems that
are not allowing overcommit, the space will be allocated but never
written unless there's a need to free up some physical memory, and the
pages in question are used infrequently enough that the system decides
that they are good candidates to be paged out. I would imagine that
Windows does the same.

cjs
-- 
Curt Sampson  [EMAIL PROTECTED]   +81 90 7737 2974   http://www.netbsd.org
Don't you know, in this new Dark Age, we're all light.  --XTC

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] location of the configuration files

2003-02-11 Thread Kevin Brown
mlw wrote:
 AFAIK it wasn't actually done. It was more of a, we should do something 
 different argument. At one point it was talked about rewriting the 
 configuration system to allow include and other things.

That seems like extreme overkill.  The PostgreSQL configuration
mechanism doesn't seem to me to be anywhere near complicated enough to
justify an include mechanism.

I agree with you: you should be able to specify all of the base
configuration information (including the location of the data
directories) in one file, and it makes perfect sense to me for the
location of the data directory to be a GUC variable.

I'd say the only thing the postmaster needs to know prior to startup
is the directory containing the postgresql.conf file.  An
administrator who wishes to set up multiple independent databases can
easily do so by using multiple config file directories.  When
consistency is required, he can easily use symlinks to point to master
config files where appropriate.


I assume $PGDATA was around long before GUC?


-- 
Kevin Brown   [EMAIL PROTECTED]

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org



Re: [pgsql-advocacy] [HACKERS] PostgreSQL Benchmarks

2003-02-11 Thread Dann Corbit
 -Original Message-
 From: ow [mailto:[EMAIL PROTECTED]] 
 Sent: Tuesday, February 11, 2003 10:32 PM
 To: Shridhar 
 Daithankar[EMAIL PROTECTED]; 
 [EMAIL PROTECTED]
 Cc: [EMAIL PROTECTED]
 Subject: Re: [pgsql-advocacy] [HACKERS] PostgreSQL Benchmarks
 
 
 There's The Open Source Database Benchmark, 
 http://osdb.sourceforge.net/.
 
 Anyone tried to use it?
 

Requires a real Linux or UNIX installation.  Won't run under Cygwin.  At
least I could not get it to work.  Got it compiled, but I am missing
something to make it work correctly.

Here are some other benchmark things:
http://www.eweek.com/article2/0,3959,293,00.asp
Which has this link:
ftp://ftp.eweek.com/pub/eweek/pdf/printpub/benchmark/dbbenchmark_v1.zip

This article:
http://www.dcc.unicamp.br/~celio/mc527/interbase/PCweek_test.html
Has this code:
ftp://ftp.zdnet.com/pcweek/labs/0207codefile.zip
But it requires the very expensive Benchmark Factory product.  It
might be useful to people who already have benchmark factory.

Another open source database test:
http://sourceforge.net/projects/osdldbt


Java database benchmark routines:
http://www.firstsql.com/firstsqlj/


This site:
http://www.mipt.sw.ru/
(which does not appear to be available right now)
Has three excellent benchmark tools, ATS, LTS, OTS.
They are Windows machine centric, though, and would probably be very
hard to port to Unix environments.

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [pgsql-advocacy] [HACKERS] PostgreSQL Benchmarks

2003-02-11 Thread Dann Corbit
A financial database benchmark:
http://www.cs.nyu.edu/cs/faculty/shasha/fintime.html

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [pgsql-advocacy] [HACKERS] PostgreSQL Benchmarks

2003-02-11 Thread Dann Corbit
This is how to get the FIPS benchmark.  It measures CONFORMANCE rather
than performance:
http://www.itl.nist.gov/div897/ctg/sql_form.htm

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org



Re: [pgsql-advocacy] [HACKERS] PostgreSQL Benchmarks

2003-02-11 Thread Dann Corbit
The benchmark handbook:
http://www.benchmarkresources.com/handbook/contents.asp

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] Hash grouping, aggregates

2003-02-11 Thread Hannu Krosing
Tom Lane kirjutas T, 11.02.2003 kell 18:39:
 Bruno Wolff III [EMAIL PROTECTED] writes:
Tom Lane [EMAIL PROTECTED] wrote:
  Greg Stark [EMAIL PROTECTED] writes:
  The neat thing is that hash aggregates would allow grouping on data types that
  have = operators but no useful  operator.
  
  Hm.  Right now I think that would barf on you, because the parser wants
  to find the '' operator to label the grouping column with, even if the
  planner later decides not to use it.  It'd take some redesign of the
  query data structure (specifically SortClause/GroupClause) to avoid that.
 
  I think another issue is that for some = operators you still might not
  be able to use a hash. I would expect the discussion for hash joins in
  http://developer.postgresql.org/docs/postgres/xoper-optimization.html
  would to hash aggregates as well.
 
 Right, the = operator must be hashable or you're out of luck.  But we
 could imagine tweaking the parser to allow GROUP BY if it finds a
 hashable = operator and no sort operator.  The only objection I can see
 to this is that it means the planner *must* use hash aggregation, which
 might be a bad move if there are too many distinct groups.

If we run out of sort memory, we can always bail out later, preferrably
with a descriptive error message. It is not as elegant as erring out at
parse (or even plan/optimise) time, but the result is /almost/ the same.

Relying on hash aggregation will become essential if we are ever going
to implement the other groupings (CUBE, ROLLUP, (), ...), so it would
be nice if hash aggregation could also overflow to disk - I suspect that
this will still be faster that running an independent scan for each
GROUP BY grouping and merging the results.

-
Hannu


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] location of the configuration files

2003-02-11 Thread Tom Lane
Kevin Brown [EMAIL PROTECTED] writes:
 I assume $PGDATA was around long before GUC?

Yes, it was.  But I have not yet seen an argument here that justifies
why $SOMECONFIGDIRECTORY/postgresql.conf is better than
$PGDATA/postgresql.conf.  The latter keeps all the related files
together.  The former seems only to introduce unnecessary complexity.
You can only justify it as simpler if you propose hardwiring a value for
$SOMECONFIGDIRECTORY ... which is a proposal that will not fly with any
of the core developers, because we all run multiple versions of Postgres
on our machines so that we can deal with back-version bug reports,
test installations, etc.  It is unlikely to fly with any of the RPM
packagers either, due to the wildly varying ideas out there about the
One True Place where applications should put their config files.

(This point was pretty much why mlw's previous proposal was rejected,
IIRC.)

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] PostgreSQL Tuning Results

2003-02-11 Thread Hans-Jürgen Schönig
Gavin Sherry wrote:


Hi Chris,

On Wed, 12 Feb 2003, Christopher Kings-Lynne wrote:

 

Machine:
256MB RAM, FreeBSD 4.7, EIDE HDD,  1 Ghz
   


Seems like a small amount of memory to be memory based tests with.

What about testing sort_mem as well. It would system to me that there
would be no negative to having infinite sort_mem given infinite memory,
though.

Gavin


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
 


Be careful with sort_mem - this might lead to VERY unexpected results. I 
did some testing on my good old Athlon 500 with a brand new IBM 120 Gigs 
HDD. Reducing the sort_mem gave me significantly faster results when 
sorting/indexing 20.000.000 randon rows.
However, it would be nice to see the results of concurrent sorts.

   Hans






--
*Cybertec Geschwinde u Schoenig*
Ludo-Hartmannplatz 1/14, A-1160 Vienna, Austria
Tel: +43/1/913 68 09; +43/664/233 90 75
www.postgresql.at http://www.postgresql.at, cluster.postgresql.at 
http://cluster.postgresql.at, www.cybertec.at 
http://www.cybertec.at, kernel.cybertec.at http://kernel.cybertec.at



---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org


Re: [HACKERS] Changing the default configuration (was Re: [pgsql-advocacy]

2003-02-11 Thread Peter Eisentraut
Tom Lane writes:

 We could retarget to try to stay under SHMMAX=4M, which I think is
 the next boundary that's significant in terms of real-world platforms
 (isn't that the default SHMMAX on some BSDen?).  That would allow us
 350 or so shared_buffers, which is better, but still not really a
 serious choice for production work.

What is a serious choice for production work?  And what is the ideal
choice?  The answer probably involves some variables, but maybe we should
get values for those variables in each case and work from there.

-- 
Peter Eisentraut   [EMAIL PROTECTED]


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] Changing the default configuration (was Re: [pgsql-advocacy]

2003-02-11 Thread Josh Berkus
Tom, Justin,

  What I would really like to do is set the default shared_buffers to
  1000.  That would be 8 meg worth of shared buffer space.  Coupled with
  more-realistic settings for FSM size, we'd probably be talking a shared
  memory request approaching 16 meg.  This is not enough RAM to bother
  any modern machine from a performance standpoint, but there are probably
  quite a few platforms out there that would need an increase in their
  stock SHMMAX kernel setting before they'd take it.

What if we supplied several sample .conf files, and let the user choose which 
to copy into the database directory?   We could have a high read 
performance profile, and a transaction database profile, and a 
workstation profile, and a low impact profile.   We could even supply a 
Perl script that would adjust SHMMAX and SHMMALL on platforms where this can 
be done from the command line.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] Changing the default configuration (was Re: [pgsql-advocacy]

2003-02-11 Thread Tom Lane
Josh Berkus [EMAIL PROTECTED] writes:
 What if we supplied several sample .conf files, and let the user choose which
 to copy into the database directory?   We could have a high read 
 performance profile, and a transaction database profile, and a 
 workstation profile, and a low impact profile.

Uh ... do we have a basis for recommending any particular sets of
parameters for these different scenarios?  This could be a good idea
in the abstract, but I'm not sure I know enough to fill in the details.

A lower-tech way to accomplish the same result is to document these
alternatives in postgresql.conf comments and encourage people to review
that file, as Steve Crawford just suggested.  But first we need the raw
knowledge.

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] Changing the default configuration (was Re: [pgsql-advocacy]

2003-02-11 Thread Justin Clift
Josh Berkus wrote:

Tom, Justin,

snip


What if we supplied several sample .conf files, and let the user choose which 
to copy into the database directory?   We could have a high read 
performance profile, and a transaction database profile, and a 
workstation profile, and a low impact profile.   We could even supply a 
Perl script that would adjust SHMMAX and SHMMALL on platforms where this can 
be done from the command line.


This might have value as the next step in the process of:

a) Are we going to have better defaults?

or

b) Let's stick with the current approach.


If we decide to go with better (changed) defaults, we may also be able 
to figure out a way of having profiles that could optionally be chosen from.

As a longer term thought, it would be nice if the profiles weren't just 
hard-coded example files, but more of:

pg_autotune --setprofile=xxx

Or similar utility, and it did all the work.  Named profiles being one 
capability, and other tuning measurements (i.e. cpu costings, disk 
performance profiles, etc) being the others.

Regards and best wishes,

Justin Clift

--
My grandfather once told me that there are two kinds of people: those
who work and those who take the credit. He told me to try to be in the
first group; there was less competition there.
   - Indira Gandhi


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html


Re: [HACKERS] Status report: regex replacement

2003-02-11 Thread Peter Eisentraut
Tatsuo Ishii writes:

  UTF-8 seems to be the most popular, but even XML standard requires all
  compliant implementations to deal with at least both UTF-8 and UTF-16.

 I don't think PostgreSQL is going to natively support UTF-16.

At FOSDEM it was claimed that Windows natively uses UCS-2, and there are
also continuing rumours that the Java Unicode encoding is not quite UTF-8,
so there is going to be a certain pressure to support other Unicode
encodings besides UTF-8.

As for the names, the SQL standard defines most of those.

-- 
Peter Eisentraut   [EMAIL PROTECTED]


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] Changing the default configuration (was Re: [pgsql-advocacy]

2003-02-11 Thread Justin Clift
Tom Lane wrote:
snip

Uh ... do we have a basis for recommending any particular sets of
parameters for these different scenarios?  This could be a good idea
in the abstract, but I'm not sure I know enough to fill in the details.

A lower-tech way to accomplish the same result is to document these
alternatives in postgresql.conf comments and encourage people to review
that file, as Steve Crawford just suggested.  But first we need the raw
knowledge.


Without too much hacking around, you could pretty easily adapt the 
pg_autotune code to do proper profiles of a system with different settings.

i.e. increment one setting at a time, run pgbench on it with some decent 
amount of transactions and users, stuff the results into a different 
database.  Aggregate data over time kind of thing.  Let it run for a 
week, etc.

If it's helpful, there's a 100% spare Althon 1.6Ghz box around with 
(choose your OS) + Adaptec 29160 + 512MB RAM + 2 x 9GB Seagate Cheetah 
10k rpm drives hanging around.  No stress to set that up and let it run 
any long terms tests you'd like plus send back results.

Regards and best wishes,

Justin Clift

			regards, tom lane



--
My grandfather once told me that there are two kinds of people: those
who work and those who take the credit. He told me to try to be in the
first group; there was less competition there.
   - Indira Gandhi


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] Changing the default configuration (was Re: [pgsql-advocacy]

2003-02-11 Thread Josh Berkus
Tom, Justin,

  Uh ... do we have a basis for recommending any particular sets of
  parameters for these different scenarios?  This could be a good idea
  in the abstract, but I'm not sure I know enough to fill in the details.

Sure.  
Mostly-Read database, few users, good hardware, complex queries:
= High shared buffers and sort mem, high geqo and join collapse thresholds,
moderate fsm settings, defaults for WAL.
Same as above with many users and simple queries (webserver) =
same as above, except lower sort mem and higher connection limit
High-Transaction Database =
Moderate shared buffers and sort mem, high FSM settings, increase WAL files 
and buffers.
Workstation =
Moderate to low shared buffers and sort mem, moderate FSM, defaults for WAL, 
etc.
Low-Impact server = current defaults, more or less.

While none of these settings will be *perfect* for anyone, they will be 
considerably better than what's shipping with postgresql.   And, based on my 
Learning Perl knowledge, I'm pretty sure I could write the program.  

All we'd need to do is argue out, on the PERFORMANCE list, what's a good value 
for each profile.  That's the tough part.  The Perl script is easy.

  A lower-tech way to accomplish the same result is to document these
  alternatives in postgresql.conf comments and encourage people to review
  that file, as Steve Crawford just suggested.  But first we need the raw
  knowledge.

That's also not a bad approach ... the CONF file should be more heavily 
commented, period, regardless of what approach we take.  I volunteer to work 
on this with other participants.

 Without too much hacking around, you could pretty easily adapt the
 pg_autotune code to do proper profiles of a system with different settings.

No offense, Justin, but I don't know anyone else who's gotten your pg_autotune 
script to run other than you.  And pg_bench has not been useful performance 
measure for any real database server I have worked on so far.

I'd be glad to help improve pg_autotune,  with two caveats:
1) We will still need to figure out the profiles above so that we have 
decent starting values.
2) I suggest that we do pg_autotune in Perl or Python or another higher-level 
language.   This would enable several performance buffs who don't do C to 
contribute to it, and a performance-tuning script is a higher-level-language 
sort of function, anyway.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] Changing the default configuration (was Re: [pgsql-advocacy]

2003-02-11 Thread Tom Lane
Justin Clift [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 Uh ... do we have a basis for recommending any particular sets of
 parameters for these different scenarios?  This could be a good idea
 in the abstract, but I'm not sure I know enough to fill in the details.

 Without too much hacking around, you could pretty easily adapt the 
 pg_autotune code to do proper profiles of a system with different settings.

 i.e. increment one setting at a time, run pgbench on it with some decent 
 amount of transactions and users, stuff the results into a different 
 database.

If I thought that pgbench was representative of anything, or even
capable of reliably producing repeatable numbers, then I might subscribe
to results derived this way.  But I have little or no confidence in
pgbench.  Certainly I don't see how you'd use it to produce
recommendations for a range of application scenarios, when it's only
one very narrow scenario itself.

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] 7.2 - 7.3 incompatibility

2003-02-11 Thread Hannu Krosing
On Mon, 2003-02-10 at 19:13, Peter Eisentraut wrote:
 Christopher Kings-Lynne writes:
 
  I found an example of a casting problem in our source code now that we're
  running 7.3:
 
  SELECT CURRENT_DATE - EXTRACT(DOW FROM CURRENT_DATE);
 
 A mathematically sound way to write this would be:
 
 select current_date - extract(dow from current_data) * interval '1 day';
 
  I'm not sure really why DOW needs to be double precision, but hey...
 
 Extract returns double precision.  It can't morph itself based on the
 argument type at run time.

Is this mandates by ANSI ?

PostgreSQL _does_ select function based on argument type

hannu=# create function f(int) returns int as 'select 1' language 'sql';
CREATE
hannu=# create function f(int,int) returns float as 'select 3.1415927'
language 'sql';
CREATE
hannu=# select f(1),f(1,1);
 f | f 
---+---
 1 | 3.1415927
(1 row)

-- 
Hannu Krosing [EMAIL PROTECTED]

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] pgbash-7.3 released

2003-02-11 Thread Oliver Elphick
On Tue, 2003-02-11 at 08:46, SAKAIDA Masaaki wrote:
 I'm pleased to announce the release of pgbash-7.3.
 http://www.psn.co.jp/PostgreSQL/pgbash/index-e.html

How do the non-PostgreSQL features of pgbash relate to standard bash? 
Do you also keep up to date with new releases of bash? or is there no
connection?

-- 
Oliver Elphick[EMAIL PROTECTED]
Isle of Wight, UK http://www.lfix.co.uk/oliver
GPG: 1024D/3E1D0C1C: CA12 09E0 E8D5 8870 5839  932A 614D 4C34 3E1D 0C1C
 
 Make a joyful noise unto the LORD, all ye lands. 
  Serve the LORD with gladness; come before his presence
  with singing. Know ye that the LORD he is God; it is 
  he that hath made us, and not we ourselves; we are his
  people, and the sheep of his pasture.   
Psalms 100:1-3 


---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org



Re: [HACKERS] pgbash-7.3 released

2003-02-11 Thread SAKAIDA Masaaki

Oliver Elphick [EMAIL PROTECTED] wrote:

 On Tue, 2003-02-11 at 08:46, SAKAIDA Masaaki wrote:
  I'm pleased to announce the release of pgbash-7.3.
  http://www.psn.co.jp/PostgreSQL/pgbash/index-e.html
 
 How do the non-PostgreSQL features of pgbash relate to standard bash? 

Please see:
http://www.psn.co.jp/PostgreSQL/pgbash/usage/usage11-e.html

 Do you also keep up to date with new releases of bash?

Yes, I would like to do. Now, new release of bash is 'bash-2.05b'. 
And, I have already made a patch for bash-2.05b. But, 'bash-2.05b'
seems to have a bug about saving array shell variables.
So, pgbash-7.3 could not release the bash-2.05b patch.

--
SAKAIDA Masaaki 
(Sorry, I am not good at English.)




---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [HACKERS] Changing the default configuration (was Re: [pgsql-advocacy]

2003-02-11 Thread Matthew T. O'Connor
On Tue, 2003-02-11 at 13:01, Tom Lane wrote:
 Jon Griffin [EMAIL PROTECTED] writes:
  So it appears that linux at least is way above your 8 meg point, unless I
  am missing something.
 
 Yeah, AFAIK all recent Linuxen are well above the range of parameters
 that I was suggesting (and even if they weren't, Linux is particularly
 easy to change the SHMMAX setting on).  It's other Unixoid platforms
 that are likely to have a problem.  Particularly the ones where you
 have to rebuild the kernel to change SHMMAX; people may be afraid to
 do that.

The issue as I see it is: 
Better performing vs. More Compatible Out of the box Defaults.

Perhaps a compromise (hack?):
Set the default to some default value that performs well, a value we all
agree is not too big (16M? 32M?). On startup, if the OS can't give us
what we want, instead of failing, we can try again with a smaller
amount, perhaps half the default, if that fails try again with half
until we reach some bottom threshold (1M?).

The argument against this might be: When I set shared_buffers=X, I want
X shared buffers. I don't want it to fail silently and give me less than
what I need / want.  To address this we might want to add a guc option
that controls this behavior. So we ship postgresql.conf with 32M of
shared memory and auto_shared_mem_reduction = true.  With a comment that
the administrator might want to turn this off for production.

Thoughts?  

I think this will allow most uninformed users get decent performing
defaults as most systems will accommodate this larger value.


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: Changing the default configuration (was Re: [pgsql-advocacy] [HACKERS] PostgreSQL Benchmarks)

2003-02-11 Thread Tom Lane
Matthew T. O'Connor [EMAIL PROTECTED] writes:
 ... So we ship postgresql.conf with 32M of
 shared memory and auto_shared_mem_reduction = true.  With a comment that
 the administrator might want to turn this off for production.

This really doesn't address Justin's point about clueless benchmarkers,
however.  In fact I fear it would make that problem worse: if Joe Blow
says he got horrible performance, who knows whether he was running with
a reasonable number of buffers or not?  Especially when you ask him
did you have lots of shared buffers and he responds yes, of course,
it says 32M right here.

We've recently been moving away from the notion that it's okay to
silently lose functionality in order to run on a given system.  For
example, if you want to install without readline, you now have to
explicitly tell configure that, because we heard why don't I have
history in psql way too often from people who just ran configure
and paid no attention to what it told them.

I think that what this discussion is really leading up to is that we
are going to decide to apply the same principle to performance.  The
out-of-the-box settings ought to give reasonable performance, and if
your system can't handle it, you should have to take explicit action
to acknowledge the fact that you aren't going to get reasonable
performance.

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



  1   2   >