from:"Nathan Myers"

Re: [HACKERS] Bad timestamp external representation

2001-07-26 Thread Nathan Myers


On Thu, Jul 26, 2001 at 05:38:23PM -0400, Bruce Momjian wrote:
 Nathan Myers wrote:
  Bruce wrote:
   
   I can confirm that current CVS sources have the same bug.
   
It's a bug in timestamp output.

# select '2001-07-24 15:55:59.999'::timestamp;
 ?column?  
---
 2001-07-24 15:55:60.00-04
(1 row)

Richard Huxton wrote:
 
 From: tamsin [EMAIL PROTECTED]
 
  Hi,
 
  Just created a db from a pg_dump file and got this error:
 
  ERROR:  copy: line 602, Bad timestamp external representation 
  '2000-10-03 09:01:60.00+00'
 
  I guess its a bad representation because 09:01:60.00+00
  is actually 09:02, but how could it have got into my
  database/can I do anything about it? The value must have
  been inserted by my app via JDBC, I can't insert that value
  directly via psql.

 Seem to remember a bug in either pg_dump or timestamp
 rendering causing rounding-up problems like this. If no-one
 else comes up with a definitive answer, check the list
 archives. If you're not running the latest release, check the
 change-log.
 
  It is not a bug, in general, to generate or accept times like
  09:01:60. Leap seconds are inserted as the 60th second of a minute.
  ANSI C defines the range of struct member tm.tm_sec as seconds
  after the minute [0-61], inclusive, and strftime format %S as the
  second as a decimal number (00-61). A footnote mentions the range
  [0-61] for tm_sec allows for as many as two leap seconds.
 
  This is not to say that pg_dump should misrepresent stored times,
  but rather that PG should not reject those misrepresented times as
  being ill-formed. We were lucky that PG has the bug which causes it
  to reject these times, as it led to the other bug in pg_dump being
  noticed.

 We should access :60 seconds but we should round 59.99 to 1:00, right?

If the xx:59.999 occurred immediately before a leap second, rounding it
up to (xx+1):00.00 would introduce an error of 1.001 seconds.

As I understand it, the problem is in trying to round 59.999 to two
digits.  My question is, why is pg_dump representing times with less 
precision than PostgreSQL's internal format?  Should pg_dump be lossy?

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] Re: RPM source files should be in CVS (was Re: [GENERAL] psql -l)

2001-07-20 Thread Nathan Myers


On Fri, Jul 20, 2001 at 07:05:46PM -0400, Trond Eivind Glomsr?d wrote:
 Tom Lane [EMAIL PROTECTED] writes:
 
  BTW, the only python shebangs I can find in CVS look like
  #! /usr/bin/env python
  Isn't that OK on RedHat?
 
 It is.

Probably the perl scripts should say, likewise, 

  #!/usr/bin/env perl

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] MySQL Gemini code

2001-07-18 Thread Nathan Myers


On Wed, Jul 18, 2001 at 11:45:54AM -0400, Bruce Momjian wrote:
  And this press release
  
  http://www.nusphere.com/releases/071601.htm
 ...
 On a more significant note, I hear the word fork clearly suggested
 in that text.  It is almost like MySQL AB GPL'ed the MySQL code and
 now they may not be able to keep control of it.

Anybody is free to fork MySQL or PostgreSQL alike.  The only difference
is that all published MySQL forks must remain public, where PostgreSQL 
forks need not.  MySQL AB is demonstrating their legal right to keep as
much control as they chose, and NuSphere will lose if it goes to court.

The interesting event here is that since NuSphere violated the license 
terms, they no longer have any rights to use or distribute the MySQL AB 
code, and won't until they get forgiveness from MySQL AB.  MySQL AB 
would be within their rights to demand that the copyright to Gemini be 
signed over, before offering forgiveness.

If Red Hat forks PostgreSQL, nobody will have any grounds for complaint.
(It's been forked lots of times already, less visibly.)

Nathan Myers 
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl

Re: [HACKERS] MySQL Gemini code

2001-07-18 Thread Nathan Myers


On Wed, Jul 18, 2001 at 08:35:58AM -0400, Jan Wieck wrote:
 And this press release
 
 http://www.nusphere.com/releases/071601.htm
 
 also  explains why they had to do it this way.

They were always free to fork, but doing it the way they did --
violating MySQL AB's license -- they shot the dog.

The lesson?  Ask somebody competent, first, before you bet your
company playing license games.

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

[HACKERS] dependent dependants

2001-07-18 Thread Nathan Myers



For the record:

  http://www.lineone.net/dictionaryof/englishusage/d0081889.html

dependent or dependant

  Dependent is the adjective, used for a person or thing that depends
  on someone or something: Admission to college is dependent on A-level
  results. Dependant is the noun, and is a person who relies on someone
  for financial support: Do you have any dependants?

This is not for mailing-list pendantism, but just to make sure 
that the right spelling gets into the code.  (The page mentioned 
above was found by entering dependent dependant into Google.)

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl

Re: [HACKERS] MySQL Gemini code

2001-07-18 Thread Nathan Myers


On Wed, Jul 18, 2001 at 06:37:48PM -0400, Trond Eivind Glomsr?d wrote:
 Michael Widenius [EMAIL PROTECTED] writes:
  Assigning over the code is also something that FSF requires for all
  code contributions.  If you criticize us at MySQL AB, you should
  also criticize the above.
 
 This is slightly different - FSF wants it so it will have a legal
 position to defend its programs: ...
 MySQL and TrollTech requires copyright assignment in order to sell
 non-open licenses. Some people will have a problem with this, while
 not having a problem with the FSF copyright assignment.

Nobody who works on MySQL is unaware of MySQL AB's business model.
Anybody who contributes to the core server has to expect that MySQL 
AB will need to relicense anything accepted into the core; that's 
their right as originators.  Everybody who contributes has a choice 
to make: fork, or sign over.  (With the GPL, forking remains possible;
Apple and Sun community licenses don't allow it.)

Anybody who contributes to PG has to make the same choice: fork, 
or put your code under the PG license.  The latter choice is 
equivalent to signing over to all proprietary vendors, who are 
then free to take your code proprietary.  Some of us like that.

  I had actually hoped to get support from you guys at PostgreSQL
  regarding this.  You may have similar experience or at least
  understand our position. The RedHat database may be a good thing
  for PostgreSQL, but I am not sure if it's a good thing for RedHat
  or for the main developers to PostgreSQL. 
 
 This isn't even a remotely similar situation: ...

It's similar enough.  One difference is that PG users are less
afraid to fork.  Another is that without the GPL, we have elected 
not to (and indeed cannot) stop any company from doing with PG what 
NuSphere is doing with MySQL.

This is why characterizing the various licenses as more or less
business-friendly is misleading (i.e. dishonest) -- it evades the 
question, friendly to whom?.  Businesses sometimes compete...

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl

Re: [HACKERS] Re: SOMAXCONN (was Re: Solaris source code)

2001-07-17 Thread Nathan Myers


On Thu, Jul 12, 2001 at 11:08:34PM +0200, Peter Eisentraut wrote:
 Nathan Myers writes:
 
  When the system is too heavily loaded (however measured), any further
  login attempts will fail.  What I suggested is, instead of the
  postmaster accept()ing the connection, why not leave the connection
  attempt in the queue until we can afford a back end to handle it?
 
 Because the new connection might be a cancel request.

Supporting cancel requests seems like a poor reason to ignore what
load-shedding support operating systems provide.  

To support cancel requests, it would suffice for PG to listen at 
another socket dedicated to administrative requests.  (It might 
even ignore MaxBackends for connections on that socket.)

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] Re: SOMAXCONN (was Re: Solaris source code)

2001-07-16 Thread Nathan Myers


On Sat, Jul 14, 2001 at 11:38:51AM -0400, Tom Lane wrote:
 
 The state of affairs in current sources is that the listen queue
 parameter is MIN(MaxBackends * 2, PG_SOMAXCONN), where PG_SOMAXCONN
 is a constant defined in config.h --- it's 1, hence a non-factor,
 by default, but could be reduced if you have a kernel that doesn't
 cope well with large listen-queue requests.  We probably won't know
 if there are any such systems until we get some field experience with
 the new code, but we could have configure select a platform-dependent
 value if we find such problems.

Considering the Apache comment about some systems truncating instead
of limiting... 10xff is 16.  Maybe 10239 would be a better choice, 
or 16383.  

 So, having thought that through, I'm still of the opinion that holding
 off accept is of little or no benefit to us.  But it's not as simple
 as it looks at first glance.  Anyone have a different take on what the
 behavior is likely to be?

After doing some more reading, I find that most OSes do not reject
connect requests that would exceed the specified backlog; instead,
they ignore the connection request and assume the client will retry 
later.  Therefore, it appears cannot use a small backlog to shed load 
unless we assume that clients will time out quickly by themselves.

OTOH, maybe it's reasonable to assume that clients will time out,
and that in the normal case authentication happens quickly.

Then we can use a small listen() backlog, and never accept() if we
have more than MaxBackend back ends.  The OS will keep a small queue
corresponding to our small backlog, and the clients will do our load 
shedding for us.

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl

Re: [HACKERS] Re: SOMAXCONN

2001-07-13 Thread Nathan Myers


On Fri, Jul 13, 2001 at 10:36:13AM +0200, Zeugswetter Andreas SB wrote:
 
  When the system is too heavily loaded (however measured), any further 
  login attempts will fail.  What I suggested is, instead of the 
  postmaster accept()ing the connection, why not leave the connection 
  attempt in the queue until we can afford a back end to handle it?  
 
 Because the clients would time out ?

It takes a long time for half-open connections to time out, by default.
Probably most clients would time out, themselves, first, if PG took too
long to get to them.  That would be a Good Thing.

Once the SOMAXCONN threshold is reached (which would only happen when 
the system is very heavily loaded, because when it's not then nothing 
stays in the queue for long), new connection attempts would fail 
immediately, another Good Thing.  When the system is very heavily 
loaded, we don't want to spare attention for clients we can't serve.

  Then, the argument to listen() will determine how many attempts can 
  be in the queue before the network stack itself rejects them without 
  the postmaster involved.
 
 You cannot change the argument to listen() at runtime, or are you suggesting
 to close and reopen the socket when maxbackends is reached ? I think 
 that would be nonsense.

Of course that would not work, and indeed nobody suggested it.

If postmaster behaved a little differently, not accept()ing when
the system is too heavily loaded, then it would be reasonable to
call listen() (once!) with PG_SOMAXCONN set to (e.g.) N=20.  

Where the system is not too heavily-loaded, the postmaster accept()s
the connection attempts from the queue very quickly, and the number
of half-open connections never builds up to N.  (This is how PG has
been running already, under light load -- except that on Solaris with 
Unix sockets N has been too small.)

When the system *is* heavily loaded, the first N attempts would be 
queued, and then the OS would automatically reject the rest.  This 
is better than accept()ing any number of attempts and then refusing 
to authenticate.  The N half-open connections in the queue would be 
picked up by postmaster as existing back ends drop off, or time out 
and give up if that happens too slowly.  

 I liked the idea of min(MaxBackends, PG_SOMAXCONN), since there is no
 use in accepting more than your total allowed connections concurrently.

That might not have the effect you imagine, where many short-lived
connections are being made.  In some cases it would mean that clients 
are rejected that could have been served after a very short delay.

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

[HACKERS] Re: SOMAXCONN (was Re: Solaris source code)

2001-07-13 Thread Nathan Myers


On Fri, Jul 13, 2001 at 07:53:02AM -0400, mlw wrote:
 Zeugswetter Andreas SB wrote:
  I liked the idea of min(MaxBackends, PG_SOMAXCONN), since there is no use in
  accepting more than your total allowed connections concurrently.
 
 I have been following this thread and I am confused why the queue
 argument to listen() has anything to do with Max backends. All the
 parameter to listen does is specify how long a list of sockets open
 and waiting for connection can be. It has nothing to do with the
 number of back end sockets which are open.

Correct.

 If you have a limit of 128 back end connections, and you have 127
 of them open, a listen with queue size of 128 will still allow 128
 sockets to wait for connection before turning others away.

Correct.

 It should be a parameter based on the time out of a socket connection
 vs the ability to answer connection requests within that period of
 time.

It's not really meaningful at all, at present.

 There are two was to think about this. Either you make this parameter
 tunable to give a proper estimate of the usability of the system, i.e.
 tailor the listen queue parameter to reject sockets when some number
 of sockets are waiting, or you say no one should ever be denied,
 accept everyone and let them time out if we are not fast enough.

 This debate could go on, why not make it a parameter in the config
 file that defaults to some system variable, i.e. SOMAXCONN.

With postmaster's current behavior there is no benefit in setting
the listen() argument to anything less than 1000.  With a small
change in postmaster behavior, a tunable system variable becomes
useful.

But using SOMAXCONN blindly is always wrong; that is often 5, which
is demonstrably too small.

 BTW: on linux, the backlog queue parameter is silently truncated to
 128 anyway.

The 128 limit is common, applied on BSD and Solaris as well.
It will probably increase in future releases.

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [HACKERS] Re: SOMAXCONN (was Re: Solaris source code)

2001-07-11 Thread Nathan Myers


On Wed, Jul 11, 2001 at 12:26:43PM -0400, Tom Lane wrote:
 Peter Eisentraut [EMAIL PROTECTED] writes:
  Tom Lane writes:
  Right.  Okay, it seems like just making it a hand-configurable entry
  in config.h.in is good enough for now.  When and if we find that
  that's inadequate in a real-world situation, we can improve on it...
 
  Would anything computed from the maximum number of allowed connections
  make sense?
 
 [ looks at code ... ]  Hmm, MaxBackends is indeed set before we arrive
 at the listen(), so it'd be possible to use MaxBackends to compute the
 parameter.  Offhand I would think that MaxBackends or at most
 2*MaxBackends would be a reasonable value.

 Question, though: is this better than having a hardwired constant?
 The only case I can think of where it might not be is if some platform
 out there throws an error from listen() when the parameter is too large
 for it, rather than silently reducing the value to what it can handle.
 A value set in config.h.in would be simpler to adapt for such a platform.

The question is really whether you ever want a client to get a
rejected result from an open attempt, or whether you'd rather they 
got a report from the back end telling them they can't log in.  The 
second is more polite but a lot more expensive.  That expense might 
really matter if you have MaxBackends already running.

I doubt most clients have tested either failure case more thoroughly 
than the other (or at all), but the lower-level code is more likely 
to have been cut-and-pasted from well-tested code. :-)

Maybe PG should avoid accept()ing connections once it has MaxBackends
back ends already running (as hinted at by Ian), so that the listen()
parameter actually has some meaningful effect, and excess connections 
can be rejected more cheaply.  That might also make it easier to respond 
more adaptively to true load than we do now.

 BTW, while I'm thinking about it: why doesn't pqcomm.c test for a
 failure return from the listen() call?  Is this just an oversight,
 or is there a good reason to ignore errors?

The failure of listen() seems impossible.  In the Linux, NetBSD, and 
Solaris man pages, none of the error returns mentioned are possible 
with PG's current use of the function.  It seems as if the most that 
might be needed now would be to add a comment to the call to socket() 
noting that if any other address families are supported (besides 
AF_INET and AF_LOCAL aka AF_UNIX), the call to listen() might need to 
be looked at.  AF_INET6 (which PG will need to support someday)
doesn't seem to change matters.

Probably if listen() did fail, then one or other of bind(), accept(),
and read() would fail too.

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: SOMAXCONN (was Re: [HACKERS] Solaris source code)

2001-07-10 Thread Nathan Myers


On Tue, Jul 10, 2001 at 05:06:28PM -0400, Bruce Momjian wrote:
  Mathijs Brands [EMAIL PROTECTED] writes:
   OK, I tried using 1024 (and later 128) instead of SOMAXCONN (defined to
   be 5 on Solaris) in src/backend/libpq/pqcomm.c and ran a few regression
   tests on two different Sparc boxes (Solaris 7 and 8). The regression
   test still fails, but for a different reason. The abstime test fails;
   not only on Solaris but also on FreeBSD (4.3-RELEASE).
  
  The abstime diff is to be expected (if you look closely, the test is
  comparing 'current' to 'June 30, 2001'.  Ooops).  If that's the only
  diff then you are in good shape.
  
  
  Based on this and previous discussions, I am strongly tempted to remove
  the use of SOMAXCONN and instead use, say,
  
  #define PG_SOMAXCONN1000
  
  defined in config.h.in.  That would leave room for configure to twiddle
  it, if that proves necessary.  Does anyone know of a platform where this
  would cause problems?  AFAICT, all versions of listen(2) are claimed to
  be willing to reduce the passed parameter to whatever they can handle.
 
 Could we test SOMAXCONN and set PG_SOMAXCONN to 1000 only if SOMAXCONN
 is less than 1000?

All the OSes we know of fold it to 128, currently.  We can jump it 
to 10240 now, or later when there are 20GHz CPUs.

If you want to make it more complicated, it would be more useful to 
be able to set the value lower for runtime environments where PG is 
competing for OS resources with another daemon that deserves higher 
priority.

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: SOMAXCONN (was Re: [HACKERS] Solaris source code)

2001-07-10 Thread Nathan Myers


On Tue, Jul 10, 2001 at 06:36:21PM -0400, Tom Lane wrote:
 [EMAIL PROTECTED] (Nathan Myers) writes:
  All the OSes we know of fold it to 128, currently.  We can jump it 
  to 10240 now, or later when there are 20GHz CPUs.
 
  If you want to make it more complicated, it would be more useful to 
  be able to set the value lower for runtime environments where PG is 
  competing for OS resources with another daemon that deserves higher 
  priority.
 
 Hmm, good point.  Does anyone have a feeling for the amount of kernel
 resources that are actually sucked up by an accept-queue entry?  If 128
 is the customary limit, is it actually worth worrying about whether
 we are setting it to 128 vs. something smaller?

I don't think the issue is the resources that are consumed by the 
accept-queue entry.  Rather, it's a tuning knob to help shed load 
at the entry point to the system, before significant resources have 
been committed.  An administrator would tune it according to actual
system and traffic characteristics.

It is easy enough for somebody to change, if they care, that it seems 
to me we have already devoted it more time than it deserves right now.

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl

Re: [HACKERS] Re: Backup and Recovery

2001-07-09 Thread Nathan Myers


On Fri, Jul 06, 2001 at 06:52:49AM -0400, Bruce Momjian wrote:
 Nathan wrote:
  How hard would it be to turn these row records into updates against a 
  pg_dump image, assuming access to a good table-image file?
 
 pg_dump is very hard because WAL contains only tids.  No way to match
 that to pg_dump-loaded rows.

Maybe pg_dump can write out a mapping of TIDs to line numbers, and the
back-end can create a map of inserted records' line numbers when the dump 
is reloaded, so that the original TIDs can be traced to the new TIDs.
I guess this would require a new option on IMPORT.  I suppose the
mappings could be temporary tables.

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Re: [HACKERS] RE: Row Versioning, for jdbc updateable result sets

2001-06-15 Thread Nathan Myers


On Fri, Jun 15, 2001 at 10:21:37AM -0400, Tom Lane wrote:
 Dave Cramer [EMAIL PROTECTED] writes:
  I had no idea that xmin even existed, but having a quick look I think this
  is what I am looking for. Can I assume that if xmin has changed, then
  another process has changed the underlying data ?
 
 xmin is a transaction ID, not a process ID, but looking at it should
 work for your purposes at present.
 
 There has been talk of redefining xmin as part of a solution to the
 XID-overflow problem: what would happen is that all sufficiently old
 tuples would get relabeled with the same special xmin, so that only
 recent transactions would need to have distinguishable xmin values.
 If that happens then your code would break, at least if you want to
 check for changes just at long intervals.

An simpler alternative was change all sufficiently old tuples to have 
an xmin value, N, equal to the oldest that would need to be distinguished.  
xmin values could then be compared using normal arithmetic: less(xminA, 
xminB) is just ((xminA - N)  (xminB - N)), with no special cases.

 A hack that comes to mind is that when relabeling an old tuple this way,
 we could copy its original xmin into cmin while setting xmin to the
 permanently-valid XID.  Then, if you compare both xmin and cmin, you
 have only about a 1 in 2^32 chance of being fooled.  (At least if we
 use a wraparound style of allocating XIDs.  I think Vadim is advocating
 resetting the XID counter to 0 at each system restart, so the active
 range of XIDs might be a lot smaller than 2^32 in that scenario.)

That assumes a pretty frequent system restart.  Many of us prefer
to code to the goal of a system that could run for decades.

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl

Re: [HACKERS] What (not) to do in signal handlers

2001-06-14 Thread Nathan Myers


On Thu, Jun 14, 2001 at 02:18:40PM -0400, Tom Lane wrote:
 Peter Eisentraut [EMAIL PROTECTED] writes:
  I notice that the signal handlers in postmaster.c do quite a lot of work,
  much more than what they teach you in school they should do.
 
 Yes, they're pretty ugly.  However, we have not recently heard any
 complaints suggesting problems with it.  Since we block signals
 everywhere except just around the select() for new input, there's not
 really any risk of recursive resource use AFAICS.
 
  ISTM that most of these, esp. pmdie(), can be written more like the SIGHUP
  handler, i.e., set a global variable and evaluate right after the
  select().
 
 I would love to see it done that way, *if* you can show me a way to
 guarantee that the signal response will happen promptly.  AFAIK there's
 no portable way to ensure that we don't end up sitting and waiting for a
 new client message before we get past the select().  

It could open a pipe, and write(2) a byte to it in the signal handler, 
and then have select(2) watch that pipe.  (SIGHUP could use the same pipe.)
Writing to and reading from your own pipe can be a recipe for deadlock, 
but here it would be safe if the signal handler knows not to get too far
ahead of select.  (The easy way would be to allow no more than one byte
in the pipe per signal handler.)

Of course this is still a system call in a signal handler, but it can't
(modulo coding bugs) fail.  See Stevens, Unix Network Programming, 
Vol. 2, Interprocess Communication, p. 91, Figure 5.10, Functions 
that are async-signal-safe.  The figure lists write() among others.
Sample code implementing the above appears on page 94.  Examples using 
other techniques (sigwait, nonblocking mq_receive) are presented also.

A pipe per backend might be considered pretty expensive.  Does UNIX 
allocate a pipe buffer before there's anything to put in it?

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl

Re: [HACKERS] What (not) to do in signal handlers

2001-06-14 Thread Nathan Myers


On Thu, Jun 14, 2001 at 04:27:14PM -0400, Tom Lane wrote:
 [EMAIL PROTECTED] (Nathan Myers) writes:
  It could open a pipe, and write(2) a byte to it in the signal handler, 
  and then have select(2) watch that pipe.  (SIGHUP could use the same pipe.)
  Of course this is still a system call in a signal handler, but it can't
  (modulo coding bugs) fail.
 
 Hm.  That's one way, but is it really any cleaner than our existing
 technique?  Since you still need to assume you can do a system call
 in a signal handler, it doesn't seem like a real gain in
 bulletproofness to me.

Quoting Stevens (UNPv2, p. 90),

  Posix uses the term *async-signal-safe* to describe the functions that
  may be called from a signal handler.  Figure 5.10 lists these Posix
  functions, along with a few that were added by Unix98.

  Functions not listed may not be called from a signal andler.  Note that
  none of the standard I/O functions ... are listed.  Of call the IPC
  functions covered in this text, only sem_post, read, and write are
  listed (we are assuming the latter two would be used with pipes and
  FIFOs).

Restricting the handler to use those in the approved list seems like an 
automatic improvement to me, even in the apparent absence of evidence 
of problems on those platforms that happen to get tested most.  

  A pipe per backend might be considered pretty expensive.
 
 Pipe per postmaster, no?  That doesn't seem like a huge cost.  

I haven't looked at how complex the signal handling in the backends is;
maybe they don't need anything this fancy.  (OTOH, maybe they should be 
using a pipe to communicate with postmaster, instead of using signals.)

 I'd be
 more concerned about the two extra kernel calls (write and read) per
 signal received, actually.

Are there so many signals flying around?  The signal handler would check 
a flag before writing, so a storm of signals would result in only one 
call to write, and one call to read, per select loop.

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Re: [HACKERS] What (not) to do in signal handlers

2001-06-14 Thread Nathan Myers


On Thu, Jun 14, 2001 at 05:10:58PM -0400, Tom Lane wrote:
 Doug McNaught [EMAIL PROTECTED] writes:
  Tom Lane [EMAIL PROTECTED] writes:
  Hm.  That's one way, but is it really any cleaner than our existing
  technique?  Since you still need to assume you can do a system call
  in a signal handler, it doesn't seem like a real gain in
  bulletproofness to me.
 
  Doing write() in a signal handler is safe; doing fprintf() (and
  friends) is not.
 
 If we were calling the signal handlers from random places, then I'd
 agree.  But we're not: we use sigblock to ensure that signals are only
 serviced at the place in the postmaster main loop where select() is
 called.  So there's no actual risk of reentrant use of non-reentrant
 library functions.
 
 Please recall that in practice the postmaster is extremely reliable.
 The single bug we have seen with the signal handlers in recent releases
 was the problem that they were clobbering errno, which was easily fixed
 by saving/restoring errno.  This same bug would have arisen (though at
 such low probability we'd likely never have solved it) in a signal
 handler that only invoked write().  So I find it difficult to buy the
 argument that there's any net gain in robustness to be had here.
 
 In short: this code isn't broken, and so I'm not convinced we should
 fix it.
 
Formally speaking, it *is* broken: we depend on semantics that are
documented as unportable and undefined.  In a sense, we have been so 
unlucky as not to have perceived, thus far, the undefined effects.  

This is no different from depending on finding a NUL at *(char*)0, or 
on being able to say free(p); p = p-next;.  Yes, it appears to work,
at the moment, on some platforms, but that doesn't make it correct.

It may not be terribly urgent to fix it right now, but that's far from
isn't broken.  It at least merits a TODO entry.

Nathan Myers
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

[HACKERS] Re: Australian timezone configure option

2001-06-13 Thread Nathan Myers


On Thu, Jun 14, 2001 at 12:23:22AM +, Thomas Lockhart wrote:
  Surely the correct solution is to have a config file somewhere
  that gets read on startup? That way us Australians don't have to be the only
  ones in the world that need a custom built postgres.
 
 I will point out that you Australians, and, well, us 'mericans, are
 the only countries without the sense to choose unique conventions for
 time zone names.
 
 It sounds like having a second lookup table for the Australian rules is
 a possibility, and this sounds fairly reasonable to me. Btw, is there an
 Australian convention for referring to North American time zones for
 those zones with naming conflicts?

For years I've been on the TZ list, the announcement list for a 
community-maintained database of time zones.  One point they have 
firmly established is that there is no reasonable hope of making 
anything like a standard system of time zone name abbreviations work.  
Legislators and dictators compete for arbitrariness in their time
zone manipulations.

Even if you assign, for your own use, an abbreviation to a particular
administrative region, you still need a history of legislation for that 
region to know what any particular time record (particularly and April 
or September) really means.

The best practice for annotating times is to tag them with the numeric
offset from UTC at the time the sample is formed.  If the time sample is
the present time, you don't have to know very much make or use it.  If 
it's in the past, you have to know the legislative history of the place 
to form a proper time record, but not to use it.  If the time is in the 
future, you cannot know what offset will be in popular use at that time, 
but at least you can be precise about what actual time you really mean,
even if you can't be sure about what the wall clock says.  (Actual wall 
clock times are not reliably predictable, a fact that occasionally makes 
things tough on airline passengers.)

Things are a little more stable in some places (e.g. in Europe it is
improving) but worldwide all is chaos.

Assigning some country's current abbreviations at compile time is madness.

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [HACKERS] Idea: quicker abort after loss of client connection

2001-06-06 Thread Nathan Myers


On Tue, Jun 05, 2001 at 08:01:02PM -0400, Tom Lane wrote:
 
 Thoughts?  Is there anything about this that might be unsafe?  Should
 QueryCancel be set after *any* failure of recv() or send(), or only
 if certain errno codes are detected (and if so, which ones)?

Stevens identifies some errno codes that are not significant;
in particular, EINTR, EAGAIN, and EWOULDBLOCK.  Of these, maybe
only the first occurs on a blocking socket.

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Re: [HACKERS] Re: Interesting Atricle

2001-06-04 Thread Nathan Myers


On Sat, Jun 02, 2001 at 10:59:20AM -0400, Vince Vielhaber wrote:
 On Fri, 1 Jun 2001, Bruce Momjian wrote:
 
Thought some people might find this article interesting.
http://www.zend.com/zend/art/databases.php
  
   The only interesting thing I noticed is how fast it crashes my
   Netscape-4.76 browser ;)
 
  Yours too?  I turned off Java/Javascript to get it to load and I am on
  BSD/OS.  Strange it so univerally crashes.
 
 Really odd.  I have Java/Javascript with FreeBSD and Netscape 4.76 and
 read it just fine.  One difference tho probably, I keep style sheets
 shut off.  Netscape crashes about 1% as often as it used to.

This is getting off-topic, but ... 

I keep CSS, Javascript, Java, dynamic fonts, and images turned off, and
Netscape 4.77 stays up for many weeks at a time.  I also have no Flash 
plugin.  All together it makes for a far more pleasant web experience.

I didn't notice any problem with the Zend page.

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

Re: [HACKERS] Re: Interesting Atricle

2001-06-04 Thread Nathan Myers


On Mon, Jun 04, 2001 at 04:55:13PM -0400, Bruce Momjian wrote:
  This is getting off-topic, but ... 
  
  I keep CSS, Javascript, Java, dynamic fonts, and images turned off, and
  Netscape 4.77 stays up for many weeks at a time.  I also have no Flash 
  plugin.  All together it makes for a far more pleasant web experience.
  
  I didn't notice any problem with the Zend page.
 
 You are running no images!  You may as well have Netscape minimized and
 say it is running for weeks.  :-)

Over 98% of the images on the web are either pr0n or wankage.  
If you don't need to see that, you can save a lot of time.

But it's usually Javascript that crashes Netscape.  (CSS appears to
be implemented using Javascript, because if you turn off Javascript,
then CSS stops working (and crashing).) That's not to say that Java 
doesn't also crash Netscape; it's just that pages with Java in them 
are not very common.

There's little point in bookmarking a site that depends on client-side
Javascript or Java, because it won't be up for very long.

But this is *really* off topic, now.

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Re: [HACKERS] Imperfect solutions

2001-05-31 Thread Nathan Myers


On Thu, May 31, 2001 at 10:07:36AM -0400, Tom Lane wrote:
 Bruce Momjian [EMAIL PROTECTED] writes:
  What got me thinking about this is that I don't think my gram.y fix
  would be accepted given the current review process,
 
 Not to put too fine a point on it: the project has advanced a long way
 since you did that code.  Our standards *should* be higher than they
 were then.
 
  and that is bad
  because we would have to live with no LIKE optimization for 1-2 years
  until we learned how to do it right.
 
 We still haven't learned how to do it right, actually.  I think the
 history of the LIKE indexing problem is a perfect example of why fixes
 that work for some people but not others don't survive long.  We put out
 several attempts at making it work reliably in non-ASCII locales, but
 none of them have withstood the test of actual usage.
 
  I think there are a few rules we can use to decide how to deal with
  imperfect solutions:
 
 You forgot
 
 * will the fix institutionalize user-visible behavior that will in the
   long run be considered the wrong thing?
 
 * will the fix contort new code that is written in the same vicinity,
   thereby making it harder and harder to replace as time goes on?
 
 The first of these is the core of my concern about %TYPE.

This list points up a problem that needs a better solution than a 
list: you have to put in questionable features now to get the usage 
experience you need to do it right later.  The set of prospective
features that meet that description does not resemble the set that
would pass all the criteria in the list.

This is really a familiar problem, with a familiar solution.  
When a feature is added that is wrong, make sure it's marked 
somehow -- at worst, in the documentation, but ideally with a 
NOTICE or something when it's used -- as experimental.  If anybody 
complains later that when you ripped it out and redid it correctly, 
you broke his code, you can just laugh, and add, if you're feeling 
charitable, experimental features are not to be depended on.

--
Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl

Re: [HACKERS] Re: charin(), text_char() should return something else for empty input

2001-05-29 Thread Nathan Myers


On Mon, May 28, 2001 at 02:37:32PM -0400, Tom Lane wrote:
 I wrote:
  I propose that both of these operations should return a space character
  for an empty input string.  This is by analogy to space-padding as you'd
  get with char(1).  Any objections?
 
 An alternative approach is to make charin and text_char map empty
 strings to the null character (\0), and conversely make charout and
 char_text map the null character to empty strings.  charout already
 acts that way, in effect, since it has to produce a null-terminated
 C string.  This way would have the advantage that there would still
 be a reversible dump and reload representation for a char field
 containing '\0', whereas space-padding would cause such a field to
 become ' ' after reload.  But it's a little strange if you think that
 char ought to behave the same as char(1).

Does the standard require any particular behavior in with NUL 
characters?  I'd like to see PG move toward treating them as ordinary 
control characters.  I realize that at best it will take a long time 
to get there.  C is irretrievably mired in the NUL is a terminator
swamp, but SQL isn't C.

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] BSD gettext

2001-05-24 Thread Nathan Myers


On Thu, May 24, 2001 at 10:30:01AM -0400, Bruce Momjian wrote:
  The HPUX man page for mmap documents its failure return value as -1,
  so I hacked around this with
  
  #ifndef MAP_FAILED
  #define MAP_FAILED ((void *) (-1))
  #endif
  
  whereupon it built and passed the simple self-test you suggested.
  However, I think it's pretty foolish to depend on mmap for such
  little reason as this code does.  I suggest ripping out the mmap
  usage and just reading the file with good old read(2).
 
 Agreed.  Let read() use mmap() internally if it wants to.

The reason mmap() is faster than read() is that it can avoid copying 
data to the place you specify.  read() can use mmap() internally only 
in cases rare enough to hardly be worth checking for.  

Stdio is often able to use mmap() internally for parsing, and in 
glibc-2.x (and, I think, on recent Solarix and BSDs) it does.  Usually, 
therefore, it would be better to use stdio functions (except fread()!) 
in place of read(), where possible, to allow this optimization.

Using mmap() in place of disk read() almost always results in enough
performance improvement to make doing so worth a lot of disruption.
Today mmap() is used heavily enough, in important programs, that 
worries about unreliability are no better founded than worries about
read().

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

Re: [HACKERS] More pgindent follies

2001-05-23 Thread Nathan Myers


On Wed, May 23, 2001 at 11:58:51AM -0400, Bruce Momjian wrote:
   I don't see the problem here.  My assumption is that the comment is not
   part of the define, right?
  
  Well, that's the question.  ANSI C requires comments to be replaced by
  whitespace before preprocessor commands are detected/executed, but there
  was an awful lot of variation in preprocessor behavior before ANSI.
  I suspect there are still preprocessors out there that might misbehave
  on this input --- for example, by leaving the text * end-of-string */
  present in the preprocessor output.  Now we still go to considerable
  lengths to support not-quite-ANSI preprocessors.  I don't like the idea
  that all the work done by configure and c.h in that direction might be
  wasted because of pgindent carelessness.
 
 I agree, but in a certain sense, we would have found those compilers
 already.  This is not new behavour as far as I know, and clearly this
 would throw a compiler error.

This is good news!

Maybe this process can be formalized.  That is, each official release 
migh contain a source file with various modern constructs which we 
suspect might break old compilers.

A comment block at the top requests that any breakage be reported.

A configure option would allow a user to avoid compiling it, and a
comment in the file would explain how to use the option.  After a
major release, any modern construct that caused no trouble in the 
last release is considered OK to use.

This process makes it easy to leave behind obsolete language 
restrictions: if you wonder if it's OK now to use a feature that once 
broke some crufty platform, drop it in modern.c and forget about it.  
After the next release, you know the answer.

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Re: [HACKERS] C++ Headers

2001-05-22 Thread Nathan Myers


On Tue, May 22, 2001 at 12:19:41AM -0400, Bruce Momjian wrote:
  This in fact has happened within ECPG. But since sizeof(bool) is passed to
  libecpg it was possible to figure out which 'bool' is requested.
  
  Another issue of C++ compatibility would be cleaning up the usage of
  'const' declarations. C++ is really strict about 'const'ness. But I don't
  know whether postgres' internal headers would need such a cleanup. (I
  suspect that in ecpg there is an oddity left with respect to host variable
  declaration. I'll check that later)
 
 We have added more const-ness to libpq++ for 7.2.

Breaking link compatibility without bumping the major version number
on the library seems to me serious no-no.

To const-ify member functions without breaking link compatibility,
you have to add another, overloaded member that is const, and turn
the non-const function into a wrapper.  For example:

  void Foo::bar() { ... }   // existing interface

becomes

  void Foo::bar() { ((const Foo*)this)-bar(); }   
  void Foo::bar() const { ... }   

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] C++ Headers

2001-05-22 Thread Nathan Myers


On Tue, May 22, 2001 at 05:52:20PM -0400, Bruce Momjian wrote:
  On Tue, May 22, 2001 at 12:19:41AM -0400, Bruce Momjian wrote:
This in fact has happened within ECPG. But since sizeof(bool) is
passed to libecpg it was possible to figure out which 'bool' is
requested.
   
Another issue of C++ compatibility would be cleaning up the
usage of 'const' declarations. C++ is really strict about
'const'ness. But I don't know whether postgres' internal headers
would need such a cleanup. (I suspect that in ecpg there is an
oddity left with respect to host variable declaration. I'll
check that later)
  
   We have added more const-ness to libpq++ for 7.2.
  
  Breaking link compatibility without bumping the major version number
  on the library seems to me serious no-no.
  
  To const-ify member functions without breaking link compatibility,
  you have to add another, overloaded member that is const, and turn
  the non-const function into a wrapper.  For example:
  
void Foo::bar() { ... }   // existing interface
  
  becomes
  
void Foo::bar() { ((const Foo*)this)-bar(); }   
void Foo::bar() const { ... }   
 
 Thanks.  That was my problem, not knowing when I break link compatiblity
 in C++.  Major updated.

Wouldn't it be better to add the forwarding function and keep
the same major number?  It's quite disruptive to change the
major number for what are really very minor changes.  Otherwise
you accumulate lots of near-copies of almost-identical libraries
to be able to run old binaries.

A major-number bump should usually be something planned for
and scheduled.

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

[HACKERS] storage density

2001-05-18 Thread Nathan Myers



When organizing available free storage for re-use, we will probably have
a choice whether to favor using space in (mostly-) empty blocks, or in 
mostly-full blocks.  Empty and mostly-empty blocks are quicker -- you 
can put lots of rows in them before they fill up and you have to choose 
another.   Preferring mostly-full blocks improves active-storage and 
cache density because a table tends to occupy fewer total blocks.

Does anybody know of papers that analyze the tradeoffs involved?

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Re: [HACKERS] Upgrade issue (again).

2001-05-18 Thread Nathan Myers


On Thu, May 17, 2001 at 12:43:49PM -0400, Rod Taylor wrote:
 Best way to upgrade might bee to do something as simple as get the
 master to master replication working.

Master-to-master replication is not simple, and (fortunately) isn't 
strictly necessary.  The minimal sequence is,

1. Start a backup and a redo log at the same time.
2. Start the new database and read the backup.
3. Get the new database consuming the redo logs.
4. When the new database catches up, make it a hot failover for the old.
5. Turn off the old database and fail over.

The nice thing about this approach is that all the parts used are 
essential parts of an enterprise database anyway, regardless of their 
usefulness in upgrading.  

Master-to-master replication is nice for load balancing, but not
necessary for failover.  Its chief benefit, there, is that you wouldn't 
need to abort the uncompleted transactions on the old database when 
you make the switch.  But master-to-master replication is *hard* to
make work, and intrusive besides.

Nathan Myers
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Re: [HACKERS] Plans for solving the VACUUM problem

2001-05-18 Thread Nathan Myers


On Fri, May 18, 2001 at 06:10:10PM -0700, Mikheev, Vadim wrote:
  Vadim, can you remind me what UNDO is used for?
 
 Ok, last reminder -:))
 
 On transaction abort, read WAL records and undo (rollback)
 changes made in storage. Would allow:
 
 1. Reclaim space allocated by aborted transactions.
 2. Implement SAVEPOINTs.
Just to remind -:) - in the event of error discovered by server
- duplicate key, deadlock, command mistyping, etc, - transaction
will be rolled back to the nearest implicit savepoint setted
just before query execution; - or transaction can be aborted by
ROLLBACK TO savepoint_name command to some explicit savepoint
setted by user. Transaction rolled back to savepoint may be continued.
 3. Reuse transaction IDs on postmaster restart.
 4. Split pg_log into small files with ability to remove old ones (which
do not hold statuses for any running transactions).

I missed the original discussions; apologies if this has already been
beaten into the ground.  But... mightn't sub-transactions be a 
better-structured way to expose this service?

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

[HACKERS] End-to-end paper

2001-05-17 Thread Nathan Myers



For those of you who have missed it, here

http://www.google.com/search?q=cache:web.mit.edu/Saltzer/www/publications/endtoend/endtoend.pdf+clark+end+to+endhl=en

is the paper some of us mention, END-TO-END ARGUMENTS IN SYSTEM DESIGN
by Saltzer, Reed, and Clark.

The abstract is:

This paper presents a design principle that helps guide placement of
functions among the modules of a distributed computer system. The
principle, called the end-to-end argument, suggests that functions
placed at low levels of a system may be redundant or of little value
when compared with the cost of providing them at that low level.
Examples discussed in the paper include bit error recovery, security
using encryption, duplicate message suppression, recovery from
system crashes, and delivery acknowledgement. Low level mechanisms
to support these functions are justified only as performance
enhancements.

It was written in 1981 and is undiminished by the subsequent decades.

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [HACKERS] Re: End-to-end paper

2001-05-17 Thread Nathan Myers


On Thu, May 17, 2001 at 06:04:54PM +0800, Lincoln Yeoh wrote:
 At 12:24 AM 17-05-2001 -0700, Nathan Myers wrote:
 
 For those of you who have missed it, here
 
 
http://www.google.com/search?q=cache:web.mit.edu/Saltzer/www/publications/endtoend/endtoend.pdf+clark+end+to+endhl=en
 
 is the paper some of us mention, END-TO-END ARGUMENTS IN SYSTEM DESIGN
 by Saltzer, Reed, and Clark.
 
 The abstract is:
 
 This paper presents a design principle that helps guide placement
 of functions among the modules of a distributed computer system.
 The principle, called the end-to-end argument, suggests that
 functions placed at low levels of a system may be redundant or
 of little value when compared with the cost of providing them
 at that low level. Examples discussed in the paper include
 bit error recovery, security using encryption, duplicate
 message suppression, recovery from system crashes, and delivery
 acknowledgement. Low level mechanisms to support these functions
 are justified only as performance enhancements.
 
 It was written in 1981 and is undiminished by the subsequent decades.

 Maybe I don't understand the paper.

Yes.  It bears re-reading.

 The end-to-end argument might be true if taking the monolithic
 approach. I find more useful ideas gleaned from the RFCs, TCP/IP and
 the OSI 7 layer model: modularity, useful standard interfaces, Be
 liberal in what you accept, and conservative in what you send and so
 on.

The end-to-end principle has had profound effects on the design of 
Internet protocols, perhaps most importantly in keeping them simpler 
than OSI's.

 Within a module I figure the end to end argument might hold,

The end-to-end principle isn't particularly applicable within a module.
It's a system-design principle. Its prescription for individual modules
is: don't imagine that anybody else gets much value from your complex
error recovery shenanigans; they have to do their own error recovery
anyway. You provide more value by making a good effort.

 but the author keeps talking about networks and networking.

Of course networking is just an example, but it's a particularly
good example. Data storage (e.g. disk) is another good example; in
the context of the paper it may be thought of as a mechanism for
communicating with other (later) times. The point there is that the CRCs
and ECC performed by the disk are not sufficient to ensure reliability
for the system (e.g. database service); for that, end-to-end measures
such as hot-failover, backups, redo logs, and block- or record-level
CRCs are needed. The purpose of the disk CRCs is not reliability, a job
they cannot do alone, but performance: they help make the need to use
the backups and redo logs infrequent enough to be tolerable.

 SSL and TCP are useful. The various CRC checks down the IP stack to
 the datalink layer have their uses too.

Yes, of course they are useful. The authors say so in the paper, and
they say precisely how (and how not).

 By splitting stuff up at appropriate points, adding or substituting
 objects at various layers becomes so much easier. People can download
 Postgresql over token ring, Gigabit ethernet, X.25 and so on.

As noted in the paper, the principle is most useful in helping to decide
what goes in each layer.

 Splitting stuff up does mean that the bits and pieces now do have
 a certain responsibility. If those responsibilities involve some
 redundancies in error checking or encryption or whatever, so be
 it, because if done well people can use those bits and pieces in
 interesting ways never dreamed of initially.

 For example SSL over TCP over IPSEC over encrypted WAP works (even
 though IPSEC is way too complicated :)). There's so much redundancy
 there, but at the same time it's not a far fetched scenario - just
 someone ordering online on a notebook pc.

The authors quote a similar example in the paper, even though it was
written twenty years ago.

 But if a low level module never bothered with error
 correction/detection/handling or whatever and was optimized for
 an application specific purpose, it's harder to use it for other
 purposes. And if you do, some chap could post an article to Bugtraq on
 it, mentioning exploit, DoS or buffer overflow.

The point is that leaving that stuff _out_ is how you keep low-level
mechanisms useful for a variety of purposes. Putting in complicated
error-recovery stuff might suit it better for a particular application,
but make it less suitable for others.

This is why, at the IP layer, packets get tossed at the first sign of
congestion. It's why TCP connections often get dropped at the first sign
of a data-format violation. This is a very deep principle; understanding
it thoroughly will make you a much better system designer.

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister

Re: [HACKERS] Configurable path to look up dynamic libraries

2001-05-15 Thread Nathan Myers


On Tue, May 15, 2001 at 05:53:36PM -0400, Bruce Momjian wrote:
  But, if I may editorialize a little myself, this is just indicative of a 
  'Fortress PostgreSQL' attitude that is easy to get into.  'We've always
 
 I have to admit I like the sound of 'Fortress PostgreSQL'.  :-)

Ye Olde PostgreSQL Shoppe
The PostgreSQL of Giza
Our Lady of PostgreSQL, Ascendant
PostgreSQL International Airport
PostgreSQL Galactica
PostgreSQL's Tavern

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

[HACKERS] Cursor support in pl/pg

2001-04-25 Thread Nathan Myers


Now that 7.1 is safely in the can, is it time to consider
this patch?  It provides cursor support in PL.

  http://www.airs.com/ian/postgresql-cursor.patch

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])

[HACKERS] tables/indexes/logs on different volumes

2001-04-25 Thread Nathan Myers


On Wed, Apr 25, 2001 at 09:41:57AM -0300, The Hermit Hacker wrote:
 On Tue, 24 Apr 2001, Nathan Myers wrote:
 
  On Tue, Apr 24, 2001 at 11:28:17PM -0300, The Hermit Hacker wrote:
   I have a Dual-866, 1gig of RAM and strip'd file systems ... this past
   week, I've hit many times where CPU usage is 100%, RAM is 500Meg free
   and disks are pretty much sitting idle ...
 
  Assuming strip'd above means striped, it strikes me that you
  might be much better off operating the drives independently, with
  the various tables, indexes, and logs scattered each entirely on one
  drive.
 
 have you ever tried to maintain a database doing this?  PgSQL is
 definitely not designed for this sort of setup, I had symlinks going
 everywhere, and with the new numbering schema, this is even more 
 difficult to try and do :)

Clearly you need to build a tool to organize it.  It would help a lot if 
PG itself could provide some basic assistance, such as calling a stored
procedure to generate the pathname of the file.

Has there been any discussion of anything like that?

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] refusing connections based on load ...

2001-04-23 Thread Nathan Myers


On Mon, Apr 23, 2001 at 03:09:53PM -0300, The Hermit Hacker wrote:
 
 Anyone thought of implementing this, similar to how sendmail does it?  If
 load  n, refuse connections?
 ... 
 If nobody is working on something like this, does anyone but me feel that
 it has merit to make use of?  I'll play with it if so ...

I agree that it would be useful.  Even more useful would be soft load 
shedding, where once some load average level is exceeded the postmaster 
delays a bit (proportionately) before accepting a connection.  

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] refusing connections based on load ...

2001-04-23 Thread Nathan Myers


On Mon, Apr 23, 2001 at 10:50:42PM -0400, Tom Lane wrote:
 Basically, if we do this then we are abandoning the notion that Postgres
 runs as an unprivileged user.  I think that's a BAD idea, especially in
 an environment that's open enough that you might feel the need to
 load-throttle your users.  By definition you do not trust them, eh?

No.  It's not a case of trust, but of providing an adaptive way
to keep performance reasonable.  The users may have no independent
way to cooperate to limit load, but the DB can provide that.

 A less dangerous way of approaching it might be to have an option
 whereby the postmaster invokes 'uptime' via system() every so often
 (maybe once a minute?) and throttles on the basis of the results.
 The reaction time would be poorer, but security would be a whole lot
 better.

Yes, this alternative looks much better to me.  On Linux you have
the much more efficient alternative, /proc/loadavg.  (I wouldn't
use system(), though.)

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [HACKERS] Is it possible to mirror the db in Postgres?

2001-04-20 Thread Nathan Myers


On Fri, Apr 20, 2001 at 03:33:38PM -0700, G. Anthony Reina wrote:
 We use Postgres 7.0.3 to store data for our scientific research. We have
 two other labs in St. Louis, MO and Tempe, AZ. I'd like to see if
 there's a way for them to mirror our database. They would be able to
 update our database when they received new results and we would be able
 to update theirs. So, in effect, we'd have 3 copies of the same db. Each
 copy would be able to update the other.
 
 Any thoughts on if this is possible?

Does the replication have to be reliable?  Are you equipped to
reconcile databases that have got out of sync, if not?  Will the
different labs ever try to update the same existing record, or
insert conflicting (unique-key) records?

Symmetric replication is easy or impossible, but usually somewhere 
in between, depending on many details.  Usually when it's made to
work, it runs on a LAN.  

Reliable WAN replication is harder.  Most of the proprietary database 
companies will tell you they can do it, but their customers will tell 
you they can't.  

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

Re: [HACKERS] Re: Is it possible to mirror the db in Postgres?

2001-04-20 Thread Nathan Myers


On Fri, Apr 20, 2001 at 04:53:43PM -0700, G. Anthony Reina wrote:
 Nathan Myers wrote:
 
  Does the replication have to be reliable?  Are you equipped to
  reconcile databases that have got out of sync, when it's not?  
  Will the different labs ever try to update the same existing 
  record, or insert conflicting (unique-key) records?
 
 (1) Yes, of course.  (2) Willing--yes; equipped--dunno.   (3) Yes,
 probably.

Hmm, good luck.  Replication, by itself, is not hard, but it's only
a tiny part of the job.  Most of the job is in handling failures
and conflicts correctly, for some (usually enormous) definition of
"correctly".

  Reliable WAN replication is harder.  Most of the proprietary database
  companies will tell you they can do it, but their customers will tell
  you they can't.
 
 Joel Burton suggested the rserv utility. I don't know how well it would
 work over a wide network.

The point about WANs is that things which work nicely in the lab, on a 
LAN, behave very differently when the communication medium is, like the 
Internet, only fitfully reliable.  You will tend to have events occurring
in unexpected order, and communications lost, and queues topping over, 
and conflicting entries in different instances which you must somehow 
reconcile after the fact.  Reconciliation by shipping the whole database 
across the WAN is often impractical, particularly when you're trying to
use it at the same time.

WAN replication is an important part of Zembu's business, and it's hard.
I would expect the rserv utility (about which I admit I know little) not
to have been designed for the job.

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Re: [HACKERS] CRN article not updated

2001-04-18 Thread Nathan Myers


On Wed, Apr 18, 2001 at 02:22:48PM -0400, Bruce Momjian wrote:
 I just checked the CRN PostgreSQL article at:
 
http://www.crn.com/Sections/Fast_Forward/fast_forward.asp?ArticleID=25670
 
 I see no changes to the article, even though Vince our webmaster, Geoff
 Davidson of PostgreSQL, Inc, and Dave Mele of Great Bridge have
 requested it be fixed.  

If _you_ had been deluged with that kind of vitriol, what kind of favors 
would you feel like doing?

 Not sure what we can do now.

It's too late.  "We" screwed it up.  (Thanks again, guys.)
The responses have done far more lasting damage than any article 
could ever have done.  The horse is dead.  

The best we can do is to plan for the future.  

1. What happens the next time a slightly inaccurate article is published? 
2. What happens when an openly hostile article is published?

Will our posse ride off again with guns blazing, making more enemies?  
Will they make us all look to potential users like a bunch of hotheaded, 
childish nobodies?

Or will we have somebody appointed, already, to write a measured,
rational, mature clarification?  Will we have articles already written,
and handed to more responsible reporters, so that an isolated badly-done 
article can do little damage?

We're not even on Oracle's radar yet.  When PG begins to threaten their 
income, their marketing department will go on the offensive.  Oracle 
marketing is very, very skillful, and very, very nasty.  If they find 
that by seeding the press with reasonable-sounding criticisms of PG, 
they can prod the PG community into making itself look like idiots, 
they will go to town on it.

Nathan Myers
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Re: [HACKERS] timeout on lock feature

2001-04-18 Thread Nathan Myers


On Wed, Apr 18, 2001 at 09:54:11AM +0200, Zeugswetter Andreas SB wrote:
   In short, I think lock timeout is a solution searching in vain for a
   problem.  If we implement it, we are just encouraging bad application
   design.
  
  I agree with Tom completely here.
  
  In any real-world application the database is the key component of a 
  larger system: the work it does is the most finicky, and any mistakes
  (either internally or, more commonly, from misuse) have the most 
  far-reaching consequences.  The responsibility of the database is to 
  provide a reliable and easily described and understood mechanism to 
  build on.
 
 It is not something that makes anything unrelyable or less robust.
 It is also simple: "I (the client) request that you (the backend) 
 dont wait for any lock longer than x seconds"

Many things that are easy to say have complicated consequences.

  Timeouts are a system-level mechanism that to be useful must refer to 
  system-level events that are far above anything that PG knows about.
 
 I think you are talking about different kinds of timeouts here.  

Exactly.  I'm talking about useful, meaningful timeouts, not random
timeouts attached to invisible events within the database.

  The only way PG could apply reasonable timeouts would be for the 
  application to dictate them, 
 
 That is exactly what we are talking about here.

No.  You wrote elsewhere that the application sets "30 seconds" and
leaves it.  But that 30 seconds doesn't have any application-level
meaning -- an operation could take twelve hours without tripping your
30-second timeout.  For the application to dictate the timeouts
reasonably, PG would have to expose all its lock events to the client
and expect it to deduce how they affect overall behavior.

  but the application can better implement them itself.
 
 It can, but it makes the program more complicated (needs timers 
 or threads, which violates your last statement "simplest interface".

It is good for the program to be more complicated if it is doing a 
more complicated thing -- if it means the database may remain simple.  
People building complex systems have an even greater need for simple
components than people building little ones.
  
What might be a reasonable alternative would be a BEGIN timeout: report 
failure as soon as possible after N seconds unless the timer is reset, 
such as by a commit.  Such a timeout would be meaningful at the 
database-interface level.  It could serve as a useful building block 
for application-level timeouts when the client environment has trouble 
applying timeouts on its own.

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Re: [HACKERS] timeout on lock feature

2001-04-18 Thread Nathan Myers


On Wed, Apr 18, 2001 at 07:33:24PM -0400, Bruce Momjian wrote:
  What might be a reasonable alternative would be a BEGIN timeout: report 
  failure as soon as possible after N seconds unless the timer is reset, 
  such as by a commit.  Such a timeout would be meaningful at the 
  database-interface level.  It could serve as a useful building block 
  for application-level timeouts when the client environment has trouble 
  applying timeouts on its own.
 
 Now that is a nifty idea.  Just put it on one command, BEGIN, and have
 it apply for the whole transaction.  We could just set an alarm and do a
 longjump out on timeout.

Of course, it begs the question why the client couldn't do that
itself, and leave PG out of the picture.  But that's what we've 
been talking about all along.

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl

Re: [HACKERS] Another news story in need of 'enlightenment'

2001-04-17 Thread Nathan Myers


On Tue, Apr 17, 2001 at 01:31:43PM -0400, Lamar Owen wrote:
 This one probably needs the 'iron hand and the velvet paw' touch.  The
 iron hand to pound some sense into the author, and the velvet paw to
 make him like having sense pounded into him. Title of article is 'Open
 Source Databases Won't Fly' --
 http://www.dqindia.com/content/enterprise/datawatch/101041201.asp

This one is best just ignored.  

It's content-free, just a his frightened opinions.  The only thing 
that will change his mind is the improvements planned for releases 
7.2 and 7.3, and lots of deployments.  Few will read his rambling.

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [HACKERS] timeout on lock feature

2001-04-17 Thread Nathan Myers


On Tue, Apr 17, 2001 at 12:56:11PM -0400, Tom Lane wrote:
 In short, I think lock timeout is a solution searching in vain for a
 problem.  If we implement it, we are just encouraging bad application
 design.

I agree with Tom completely here.

In any real-world application the database is the key component of a 
larger system: the work it does is the most finicky, and any mistakes
(either internally or, more commonly, from misuse) have the most 
far-reaching consequences.  The responsibility of the database is to 
provide a reliable and easily described and understood mechanism to 
build on.  

Timeouts are a system-level mechanism that to be useful must refer to 
system-level events that are far above anything that PG knows about.  
The only way PG could apply reasonable timeouts would be for the 
application to dictate them, but the application can better implement 
them itself.

You can think of this as another aspect of the "end-to-end" principle: 
any system-level construct duplicated in a lower-level system component 
can only improve efficiency, not provide the corresponding high-level 
service.  If we have timeouts in the database, they should be there to
enable the database to better implement its abstraction, and not pretend 
to be a substitute for system-level timeouts.

There's no upper limit on how complicated a database interface can
become (cf. Oracle).  The database serves its users best by having 
the simplest interface that can possibly provide the needed service. 

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [HACKERS] Fast Forward (fwd)

2001-04-15 Thread Nathan Myers


On Sun, Apr 15, 2001 at 01:17:15AM -0400, Vince Vielhaber wrote:
 
 Here's my response to the inaccurate article cmp produced.  After
 chatting with Marc I decided to post it myself.
 ... 
 Where do you get your info?  Do you just make it up?  PostgreSQL is
 not a product of Great Bridge and never has been.  It's 100% independant.
 Is Linux a keyword you figure you can use to draw readers?  Won't take
 long before folks determine you're full of it.  The PostgreSQL team takes
 great pride (not to be confused with great bridge) in ensuring that the
 work we do runs on ALL platforms; be it Mac's OSX, FreeBSD 4.3, or even
 Windows 2000.  So why do you figure this is a Great Bridge product?  Why
 do you figure it's Linux only?  What is it with you writers lately?  Are
 you getting lazy and simply using Linux as a quick out for a paycheck?

This is probably a good time to point out that this is the _worst_
_possible_ response to erroneous reportage.  The perception by readers
will not be that the reporter failed, but that PostgreSQL advocates are 
rabid weasels who don't appreciate favorable attention, and are dangerous 
to write anything about.  You can bet this reporter and her editor will 
treat the topic very circumspectly (i.e. avoid it) in the future.  
When they have to mention it, their reporting will be colored by their 
personal experience.  They (and their readers) don't run the code, 
so they must get their impressions from those who do.  

Most reporters are ignorant, most reporters are lazy, and many
are both.  It's part of the job description.  Getting angry about
it is like getting angry at birds for fouling their cage.  Their
job is to regurgitate what they're given, and quickly.  They have no 
time to learn the depths, or to write coherently about it, or even 
to check facts.

None of the errors in the article matter.  Nobody will develop an
enduring impression of PG from them.  What matters is that PG is being 
mentioned in the same article with Oracle.  In her limited way, she
did the PG community the biggest favor in her limited power, and all 
we can do is attack?

It will be harder than the original mailings, but I urge each who
wrote to write again and apologize for attacking her.  Thank her 
graciously for making an effort, and offer to help her check her 
facts next time.  PostgreSQL needs friends in the press, even if
they are ignorant or lazy.  It doesn't need any enemies in the press.

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] Fast Forward (fwd)

2001-04-15 Thread Nathan Myers


On Sun, Apr 15, 2001 at 11:44:48AM -0300, The Hermit Hacker wrote:
 On Sat, 14 Apr 2001, Nathan Myers wrote:
 
  This is probably a good time to point out that this is the _worst_
  _possible_ response to erroneous reportage.  The perception by readers
  will not be that the reporter failed, but that PostgreSQL advocates
  are rabid weasels who don't appreciate favorable attention, and are
 
 favorable attention??

Yes, totally favorable.   There wasn't a hint of the condescension 
typically accorded free software.  All of the details you find so 
objectionable (April vs. June?  "The" marketing arm vs. "a" marketing
arm?) would not even be noticed by a non-cultist.

  dangerous to write anything about.  You can bet this reporter and her
  editor will treat the topic very circumspectly (i.e. avoid it) in the
  future.
 
 woo hoo, if that is the result, then I think Vince did us a great service,
 not dis-service ...

False.  

This may have been the reporter's and the editor's first direct
exposure to free software advocates.  You guys came across as 
hate-filled religious whackos, and that reflects on all of us.  

  Most reporters are ignorant, most reporters are lazy, and many are
  both.  It's part of the job description.  Getting angry about it is
  like getting angry at birds for fouling their cage.  Their job is to
  regurgitate what they're given, and quickly.  They have no time to
  learn the depths, or to write coherently about it, or even to check
  facts.
 
 Out of all the articles on PgSQL that I've read over the years, this one
 should have been shot before it hit the paper (so to say) ... it was the
 most blatantly inaccurate article I've ever read ...

It had a number of minor errors, easily corrected.  The next will 
probably talk about what a bunch of nasty cranks and lunatics 
PostgreSQL fans are, unless you who wrote can display a lot more 
finesse in your apologies.  Thanks a lot, guys.

  It will be harder than the original mailings, but I urge each who
  wrote to write again and apologize for attacking her.
 
 In a way, I think you are right .. I think the attack was aimed at the
 wrong ppl :(  She obviously didn't get *any* of her information from ppl
 that belong *in* the Pg community, or that have any knowledge of how it
 works, or of its history :(

How is this reporter going to have developed contacts within the 
community?  She has just started.  Now you've burnt her to a crisp, 
and she will figure the less contact with that "community" she has, 
the happier she'll be.  Her editor will know that mentioning PG in
any context will result in a raft of hate mail from cranks, and will 
treat press releases from our community with the scorn they have earned.

Reporters are fragile creatures, and must be gently guided toward the
light.  They will always get facts wrong, but that matter not at all.
The overall tone of the writing is the only thing that stays with their
equally dim audience.  That dim audience controls the budgets for 
technology deployment, including databases.  Next time you propose a
deployment on PG instead of Oracle, thank Vince et al. when it's 
dismissed as a crank toy.

Finally, their talkback page was most probably implemented _not_ with 
MySQL, but with MS SQL Server.  These intramural squabbles (between 
MySQL and PG, between Linux and BSD, between NetBSD and OpenBSD) are 
justifiably seen as pathetic in the outside world.  Respectful attention 
among projects doesn't just create a better impression, it also allows 
you, maybe, to learn something.  (MySQL is not objectively as good as 
PG, but those guys are doing something right, in their presentation, 
that some of us could learn from.)

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] Re: Hey guys, check this out.

2001-04-15 Thread Nathan Myers


On Sun, Apr 15, 2001 at 10:05:46PM -0400, Vince Vielhaber wrote:
 On Mon, 16 Apr 2001, Lincoln Yeoh wrote:
 
  Maybe you guys should get some Great Bridge marketing/PR person to handle
  stuff like this.
 
 After reading Ned's comments I figured that's how it got that way in
 the first place.  But that's just speculation.

You probably figured wrong.  

All those publications have editors who generally feel they're not 
doing their job if they don't introduce errors, usually without even 
talking to the reporter.  That's probably how the "FreeBSD" reference 
got in there: somebody saw "Berkeley" and decided "FreeBSD" would look 
more "techie".  It's stupid, but nothng to excoriate the reporter about.

Sam Williams's articles read completely differently according to 
who publishes them.  Typically the Linux magazines print what he 
writes, and thereby get it mostly right, but the finance magazines 
mangle them to total nonsense.

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

[HACKERS] Truncation of object names

2001-04-13 Thread Nathan Myers


On Fri, Apr 13, 2001 at 01:16:43AM -0400, Tom Lane wrote:
 [EMAIL PROTECTED] (Nathan Myers) writes:
  We have noticed here also that object (e.g. table) names get truncated 
  in some places and not others.  If you create a table with a long name, 
  PG truncates the name and creates a table with the shorter name; but 
  if you refer to the table by the same long name, PG reports an error.
 
 Example please?  This is clearly a bug.  

Sorry, false alarm.  When I got the test case, it turned out to
be the more familiar problem:

  create table foo_..._bar1 (id1 ...);
[notice, "foo_..._bar1" truncated to "foo_..._bar"]
  create table foo_..._bar (id2 ...);
[error, foo_..._bar already exists]
  create index foo_..._bar_ix on foo_..._bar(id2);
[notice, "foo_..._bar_ix" truncated to "foo_..._bar"]
[error, foo_..._bar already exists]
[error, attribute "id2" not found]

It would be more helpful for the first "create" to fail so we don't 
end up cluttered with objects that shouldn't exist, and which interfere
with operations on objects which should.

But I'm not proposing that for 7.1.

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [HACKERS] Truncation of object names

2001-04-13 Thread Nathan Myers


On Fri, Apr 13, 2001 at 02:54:47PM -0400, Tom Lane wrote:
 [EMAIL PROTECTED] (Nathan Myers) writes:
  Sorry, false alarm.  When I got the test case, it turned out to
  be the more familiar problem:
 
create table foo_..._bar1 (id1 ...);
  [notice, "foo_..._bar1" truncated to "foo_..._bar"]
create table foo_..._bar (id2 ...);
  [error, foo_..._bar already exists]
create index foo_..._bar_ix on foo_..._bar(id2);
  [notice, "foo_..._bar_ix" truncated to "foo_..._bar"]
  [error, foo_..._bar already exists]
  [error, attribute "id2" not found]
 
  It would be more helpful for the first "create" to fail so we don't 
  end up cluttered with objects that shouldn't exist, and which interfere
  with operations on objects which should.
 
 Seems to me that if you want a bunch of CREATEs to be mutually
 dependent, then you wrap them all in a BEGIN/END block.

Yes, but...  The second and third commands weren't supposed to be 
related to the first at all, never mind dependent on it.  They were 
made dependent by PG crushing the names together.

We are thinking about working around the name length limitation 
(encountered in migrating from other dbs) by allowing "foo.bar.baz" 
name syntax, as a sort of rudimentary namespace mechanism.  It ain't
schemas, but it's better than "foo__bar__baz".

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

Re: [HACKERS] Anyone have any good addresses ... ?

2001-04-13 Thread Nathan Myers


On Fri, Apr 13, 2001 at 06:32:26PM -0400, Trond Eivind Glomsr?d wrote:
 The Hermit Hacker [EMAIL PROTECTED] writes:
 
  Here is what we've always sent to to date ... anyone have any good ones
  to add?
  
  
  Addresses : [EMAIL PROTECTED],
  [EMAIL PROTECTED],
  [EMAIL PROTECTED],
  [EMAIL PROTECTED],
  [EMAIL PROTECTED],
  [EMAIL PROTECTED],
  [EMAIL PROTECTED],
  [EMAIL PROTECTED],
  [EMAIL PROTECTED]
 
 Freshmeat, linuxtoday. If the release includes RPMs for Red Hat Linux,
 redhat-announce is also a suitable location.

Linux Journal: [EMAIL PROTECTED]
Freshmeat:  [EMAIL PROTECTED]
LinuxToday: http://linuxtoday.com/contribute.php3

-- 
Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

Re: [HACKERS] Re: Hand written parsers

2001-04-12 Thread Nathan Myers


On Wed, Apr 11, 2001 at 10:44:59PM -0700, Ian Lance Taylor wrote:
 Mark Butler [EMAIL PROTECTED] writes:
  ...
  The advantages of using a hand written recursive descent parser lie in
  1) ease of implementing grammar changes 
  2) ease of debugging
  3) ability to handle unusual cases
  4) ability to support context sensitive grammars
  ...
  Another nice capability is the ability to enable and disable grammar
  rules at run time ...

 On the other hand, recursive descent parsers tend to be more ad hoc,
 they tend to be harder to maintain, and they tend to be less
 efficient.  ...  And I note that despite the
 difficulties, the g++ parser is yacc based.

Yacc and yacc-like programs are most useful when the target grammar (or 
your understanding of it) is not very stable.  With Yacc you can make 
sweeping changes much more easily; big changes can be a lot of work in 
a hand-coded parser.  Once your grammar stabilizes, though, hand coding 
can provide flexibility that is inconceivable in a parser generator, 
albeit at some cost in speed and compact description.  (I doubt parser 
speed is an issue for PG.)

G++ has flirted seriously with switching to a recursive-descent parser,
largely to be able to offer meaningful error messages and to recover
better from errors, as well as to be able to parse some problematic
but conformant (if unlikely) programs.

Note that the choice is not just between Yacc and a hand-coded parser.
Since Yacc, many more powerful parser generators have been released,
one of which might be just right for PG.

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl

Re: [HACKERS] Truncation of char, varchar types

2001-04-09 Thread Nathan Myers


On Mon, Apr 09, 2001 at 09:20:42PM +0200, Peter Eisentraut wrote:
 Excessively long values are currently silently truncated when they are
 inserted into char or varchar fields.  This makes the entire notion of
 specifying a length limit for these types kind of useless, IMO.  Needless
 to say, it's also not in compliance with SQL.
 
 How do people feel about changing this to raise an error in this
 situation?  Does anybody rely on silent truncation?  Should this be
 user-settable, or can those people resort to using triggers?

Yes, detecting and reporting errors early is a Good Thing.  You don't 
do anybody any favors by pretending to save data, but really throwing 
it away.

We have noticed here also that object (e.g. table) names get truncated 
in some places and not others.  If you create a table with a long name, 
PG truncates the name and creates a table with the shorter name; but 
if you refer to the table by the same long name, PG reports an error.  
(Very long names may show up in machine- generated schemas.) Would 
patches for this, e.g. to refuse to create a table with an impossible 
name, be welcome?  

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] Re: TODO list

2001-04-05 Thread Nathan Myers


On Thu, Apr 05, 2001 at 04:25:42PM -0400, Ken Hirsch wrote:
TODO updated.  I know we did number 2, but did we agree on #1 and is
 it
done?
  
   #2 is indeed done.  #1 is not done, and possibly not agreed to ---
   I think Vadim had doubts about its usefulness, though personally I'd
   like to see it.
 
  That was my recollection too.  This was the discussion about testing the
  disk hardware.  #1 removed.
 
 What is recommended in the bible (Gray and Reuter), especially for larger
 disk block sizes that may not be written atomically, is to have a word at
 the end of the that must match a word at the beginning of the block.  It
 gets changed each time you write the block.

That only works if your blocks are atomic.  Even SCSI disks reorder
sector writes, and they are free to write the first and last sectors
of an 8k-32k block, and not have written the intermediate blocks 
before the power goes out.  On IDE disks it is of course far worse.

(On many (most?) IDE drives, even when they have been told to report 
write completion only after data is physically on the platter, they will 
"forget" if they see activity that looks like benchmarking.  Others just 
ignore the command, and in any case they all default to unsafe mode.)

If the reason that a block CRC isn't on the TODO list is that Vadim
objects, maybe we should hear some reasons why he objects?  Maybe 
the objections could be dealt with, and everyone satisfied.

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Re: [HACKERS] Re: TODO list

2001-04-05 Thread Nathan Myers


On Thu, Apr 05, 2001 at 02:27:48PM -0700, Mikheev, Vadim wrote:
  If the reason that a block CRC isn't on the TODO list is that Vadim
  objects, maybe we should hear some reasons why he objects?  Maybe 
  the objections could be dealt with, and everyone satisfied.
 
 Unordered disk writes are covered by backing up modified blocks
 in log. It allows not only catch such writes, as would CRC do,
 but *avoid* them.
 
 So, for what CRC could be used? To catch disk damages?
 Disk has its own CRC for this.

OK, this was already discussed, maybe while Vadim was absent.  
Should I re-post the previous text?

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [HACKERS] Re: TODO list

2001-04-05 Thread Nathan Myers


On Thu, Apr 05, 2001 at 06:25:17PM -0400, Tom Lane wrote:
 "Mikheev, Vadim" [EMAIL PROTECTED] writes:
  If the reason that a block CRC isn't on the TODO list is that Vadim
  objects, maybe we should hear some reasons why he objects?  Maybe 
  the objections could be dealt with, and everyone satisfied.
 
  Unordered disk writes are covered by backing up modified blocks
  in log. It allows not only catch such writes, as would CRC do,
  but *avoid* them.
 
  So, for what CRC could be used? To catch disk damages?
  Disk has its own CRC for this.
 
 Blocks that have recently been written, but failed to make it down to
 the disk platter intact, should be restorable from the WAL log.  So we
 do not need a block-level CRC to guard against partial writes.

If a block is missing some sectors in the middle, how would you know
to reconstruct it from the WAL, without a block CRC telling you that
the block is corrupt?

 
 A block-level CRC might be useful to guard against long-term data
 lossage, but Vadim thinks that the disk's own CRCs ought to be
 sufficient for that (and I can't say I disagree).

The people who make the disks don't agree.  

They publish the error rate they guarantee, and they meet it, more 
or less.  They publish a rate that is _just_ low enough to satisfy 
noncritical requirements (on the correct assumption that they can't 
satisfy critical requirements in any case) and high enough not to 
interfere with benchmarks.  They assume that if you need better 
reliability you can and will provide it yourself, and rely on their 
CRC only as a performance optimization.

At the raw sector level, they get (and correct) errors very frequently; 
when they are not getting "enough" errors, they pack the bits more 
densely until they do, and sell a higher-density drive.

 So the only real benefit of a block-level CRC would be to guard against
 bits dropped in transit from the disk surface to someplace else, ie,
 during read or during a "cp -r" type copy of the database to another
 location.  That's not a totally negligible risk, but is it worth the
 overhead of updating and checking block CRCs?  Seems dubious at best.

Vadim didn't want to re-open this discussion until after 7.1 is out
the door, but that "dubious at best" demands an answer.  See the archive 
posting:

http://www.postgresql.org/mhonarc/pgsql-hackers/2001-01/msg00473.html

...

Incidentally, is the page at 

  http://www.postgresql.org/mhonarc/pgsql-hackers/2001-01/

the best place to find old messages?  It's never worked right for me.

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Re: [HACKERS] Final call for platform testing

2001-04-03 Thread Nathan Myers


On Tue, Apr 03, 2001 at 03:31:25PM +, Thomas Lockhart wrote:
 
 OK. So we are close to a final tally of supported machines.
 ...
 Here are the up-to-date platforms:
 
 AIX 4.3.3 RS6000   7.1 2001-03-21, Gilles Darold
 BeOS 5.0.4 x86 7.1 2000-12-18, Cyril Velter
 BSDI 4.01  x86 7.1 2001-03-19, Bruce Momjian
 Compaq Tru64 4.0g Alpha 7.1 2001-03-19, Brent Verner
 FreeBSD 4.3 x867.1 2001-03-19, Vince Vielhaber
 HPUX PA-RISC   7.1 2001-03-19, 10.20 Tom Lane, 11.00 Giles Lean
 IRIX 6.5.11 MIPS   7.1 2001-03-22, Robert Bruccoleri
 Linux 2.2.x Alpha  7.1 2001-01-23, Ryan Kirkpatrick
 Linux 2.2.x armv4l 7.1 2001-03-22, Mark Knox
 Linux 2.0.x MIPS   7.1 2001-03-30, Dominic Eidson
 Linux 2.2.18 PPC74xx 7.1 2001-03-19, Tom Lane
 Linux 2.2.x S/390  7.1 2000-11-17, Neale Ferguson
 Linux 2.2.15 Sparc 7.1 2001-01-30, Ryan Kirkpatrick
 Linux 2.2.16 x86   7.1 2001-03-19, Thomas Lockhart
 MacOS X Darwin PPC 7.1 2000-12-11, Peter Bierman
 NetBSD 1.5 Alpha   7.1 2001-03-22, Giles Lean
 NetBSD 1.5E arm32  7.1 2001-03-21, Patrick Welche
 NetBSD m68k7.0 2000-04-10 (Henry has lost machine)
 NetBSD Sparc   7.0 2000-04-13, Tom I. Helbekkmo
 NetBSD VAX 7.1 2001-03-30, Tom I. Helbekkmo
 NetBSD 1.5 x86 7.1 2001-03-23, Giles Lean
 OpenBSD 2.8 Sparc  7.1 2001-03-23, Brandon Palmer
 OpenBSD 2.8 x867.1 2001-03-22, Brandon Palmer
 SCO OpenServer 5 x86   7.1 2001-03-13, Billy Allie
 SCO UnixWare 7.1.1 x86 7.1 2001-03-19, Larry Rosenman
 Solaris 2.7-8 Sparc7.1 2001-03-22, Marc Fournier
 Solaris x867.1 2001-03-27, Mathijs Brands
 SunOS 4.1.4 Sparc  7.1 2001-03-23, Tatsuo Ishii
 WinNT/Cygwin x86   7.1 2001-03-16, Jason Tishler
 
 And the "unsupported platforms":
 
 DGUX m88k
 MkLinux DR1 PPC750 7.0 2000-04-13, Tatsuo Ishii
 NextStep x86
 QNX 4.25 x86   7.0 2000-04-01, Dr. Andreas Kardos
 System V R4 m88k
 System V R4 MIPS
 Ultrix MIPS7.1 2001-03-26, Alexander Klimov
 Windows/Win32 x86  7.1 2001-03-26, Magnus Hagander (clients only)

I saw three separate reports of successful builds on Linux 2.4.2 on x86
(including mine), but it isn't listed here.  

-- 
Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

Re: [HACKERS] Re: Final call for platform testing

2001-04-03 Thread Nathan Myers


On Tue, Apr 03, 2001 at 11:19:04PM +, Thomas Lockhart wrote:
  I saw three separate reports of successful builds on Linux 2.4.2 on x86
  (including mine), but it isn't listed here.
 
 It is listed in the comments in the real docs. At least one report was
 for an extensively patched 2.4.2, and I'm not sure of the true lineage
 of the others.

You could ask.  Just to ignore reports that you have asked for is not 
polite.  My report was based on a virgin, unpatched 2.4.2 kernel, and 
(as noted) the Debian-packaged glibc-2.2.2.  

If you are trying to trim your list, would be reasonable to drop 
Linux-2.0.x, because that version is not being maintained any more.

 I *could* remove the version info from the x86 listing, and mention both
 2.2.x and 2.4.x in the comments.

Linux-2.2 and Linux-2.4 are different codebases.  It is worth noting,
besides, the glibc-version tested along with each Linux kernel version.

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Re: [HACKERS] Re: Changing the default value of an inherited column

2001-04-02 Thread Nathan Myers


On Sun, Apr 01, 2001 at 03:15:56PM -0400, Tom Lane wrote:
 Christopher Masto [EMAIL PROTECTED] writes:
  Another thing that seems kind of interesting would be to have:
  CREATE TABLE base (table_id CHAR(8) NOT NULL [, etc.]);
  CREATE TABLE foo  (table_id CHAR(8) NOT NULL DEFAULT 'foo');
  CREATE TABLE bar  (table_id CHAR(8) NOT NULL DEFAULT 'bar');
  Then a function on "base" could look at table_id and know which
  table it's working on.  A waste of space, but I can think of
  uses for it.
 
 This particular need is superseded in 7.1 by the 'tableoid'
 pseudo-column.  However you can certainly imagine variants of this
 that tableoid doesn't handle, for example columns where the subtable
 creator can provide a useful-but-not-always-correct default value.

A bit of O-O doctrine... when you find yourself tempted to do something 
like the above, it usually means you're trying to do the wrong thing.  
You may not have a choice, in some cases, but you should know you are 
on the way to architecture meltdown.  "She'll blow, Cap'n!"

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] Re: Changing the default value of an inherited column

2001-04-02 Thread Nathan Myers


On Mon, Apr 02, 2001 at 01:27:06PM -0400, Tom Lane wrote:
 Philip: the rule that pg_dump needs to apply w.r.t. defaults for
 inherited fields is that if an inherited field has a default and
 either (a) no parent table supplies a default, or (b) any parent
 table supplies a default different from the child's, then pg_dump
 had better emit the child field explicitly.

The rule above appears to work even if inherited-default conflicts 
are not taken as an error, but just result in a derived-table column 
with no default.

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl

Re: [HACKERS] MIPS test-and-set

2001-03-26 Thread Nathan Myers


On Mon, Mar 26, 2001 at 07:09:38PM -0500, Tom Lane wrote:
 Thomas Lockhart [EMAIL PROTECTED] writes:
  That is not already available from the Irix support code?
 
 What we have for IRIX is
 ... 
 Doesn't look to me like it's likely to work on anything but IRIX ...

I have attached linuxthreads/sysdeps/mips/pt-machine.h from glibc-2.2.2
below.  (Glibc linuxthreads has alpha, arm, hppa, i386, ia64, m68k, mips,
powerpc, s390, SH, and SPARC support, at least in some degree.)

Since the actual instruction sequence is probably lifted from the 
MIPS manual, it's probably much freer than GPL.  For the paranoid,
the actual instructions, extracted, are just

   1:
 ll   %0,%3
 bnez %0,2f
  li  %1,1
 sc   %1,%2
 beqz %1,1b
   2:

Nathan Myers
[EMAIL PROTECTED]

---
/* Machine-dependent pthreads configuration and inline functions.

   Copyright (C) 1996, 1997, 1998, 2000 Free Software Foundation, Inc.
   This file is part of the GNU C Library.
   Contributed by Ralf Baechle [EMAIL PROTECTED].
   Based on the Alpha version by Richard Henderson [EMAIL PROTECTED].

   The GNU C Library is free software; you can redistribute it and/or
   modify it under the terms of the GNU Library General Public License as
   published by the Free Software Foundation; either version 2 of the
   License, or (at your option) any later version.

   The GNU C Library is distributed in the hope that it will be useful,
   but WITHOUT ANY WARRANTY; without even the implied warranty of
   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
   Library General Public License for more details.

   You should have received a copy of the GNU Library General Public
   License along with the GNU C Library; see the file COPYING.LIB.  If
   not, write to the Free Software Foundation, Inc.,
   59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.  */

#include sgidefs.h
#include sys/tas.h

#ifndef PT_EI
# define PT_EI extern inline
#endif

/* Memory barrier.  */
#define MEMORY_BARRIER() __asm__ ("" : : : "memory")


/* Spinlock implementation; required.  */

#if (_MIPS_ISA = _MIPS_ISA_MIPS2)

PT_EI long int
testandset (int *spinlock)
{
  long int ret, temp;

  __asm__ __volatile__
("/* Inline spinlock test  set */\n\t"
 "1:\n\t"
 "ll%0,%3\n\t"
 ".set  push\n\t"
 ".set  noreorder\n\t"
 "bnez  %0,2f\n\t"
 " li   %1,1\n\t"
 ".set  pop\n\t"
 "sc%1,%2\n\t"
 "beqz  %1,1b\n"
 "2:\n\t"
 "/* End spinlock test  set */"
 : "=r" (ret), "=r" (temp), "=m" (*spinlock)
 : "m" (*spinlock)
 : "memory");

  return ret;
}

#else /* !(_MIPS_ISA = _MIPS_ISA_MIPS2) */

PT_EI long int
testandset (int *spinlock)
{
  return _test_and_set (spinlock, 1);
}
#endif /* !(_MIPS_ISA = _MIPS_ISA_MIPS2) */


/* Get some notion of the current stack.  Need not be exactly the top
   of the stack, just something somewhere in the current frame.  */
#define CURRENT_STACK_FRAME  stack_pointer
register char * stack_pointer __asm__ ("$29");


/* Compare-and-swap for semaphores. */

#if (_MIPS_ISA = _MIPS_ISA_MIPS2)

#define HAS_COMPARE_AND_SWAP
PT_EI int
__compare_and_swap (long int *p, long int oldval, long int newval)
{
  long int ret;

  __asm__ __volatile__
("/* Inline compare  swap */\n\t"
 "1:\n\t"
 "ll%0,%4\n\t"
 ".set  push\n"
 ".set  noreorder\n\t"
 "bne   %0,%2,2f\n\t"
 " move %0,%3\n\t"
 ".set  pop\n\t"
 "sc%0,%1\n\t"
 "beqz  %0,1b\n"
 "2:\n\t"
 "/* End compare  swap */"
 : "=r" (ret), "=m" (*p)
 : "r" (oldval), "r" (newval), "m" (*p)
 : "memory");

  return ret;
}

#endif /* (_MIPS_ISA = _MIPS_ISA_MIPS2) */

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Re: [HACKERS] Banner links not working (fwd)

2001-03-12 Thread Nathan Myers


On Mon, Mar 12, 2001 at 08:05:26PM +, Peter Mount wrote:
 At 11:41 12/03/01 -0500, Vince Vielhaber wrote:
 On Mon, 12 Mar 2001, Peter Mount wrote:
 
   Bottom of every page (part of the template) is both my name and email
   address ;-)
 
 Can we slightly enlarge the font?
 
 Can do. What size do you think is best?
 
 I've always used size=1 for that line...

Absolute font sizes in HTML are always a mistake.  size="-1" would do.

--
Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [HACKERS] Internationalized dates (was Internationalized error messages)

2001-03-12 Thread Nathan Myers


On Mon, Mar 12, 2001 at 11:11:46AM +0100, Karel Zak wrote:
 On Fri, Mar 09, 2001 at 10:58:02PM +0100, Kaare Rasmussen wrote:
  Now you're talking about i18n, maybe someone could think about input and
  output of dates in local language.
  
  As fas as I can tell, PostgreSQL will only use English for dates, eg January,
  February and weekdays, Monday, Tuesday etc. Not the local name.
 
  May be add special mask to to_char() and use locales for this, but I not
 sure. It isn't easy -- arbitrary size of strings, to_char's cache problems
 -- more and more difficult is parsing input with locales usage. 
 The other thing is speed...
 
  A solution is use number based dates without names :-(

ISO has published a standard on date/time formats, ISO 8601.  
Dates look like "2001-03-22".  Times look like "12:47:63".  
The only unfortunate feature is their standard format for a 
date/time: "2001-03-22T12:47:63".  To me the ISO date format
is far better than something involving month names. 

I'd like to see ISO 8601 as the default data format.

--
Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl

Re: [HACKERS] Uh, this is not a 64-bit CRC ...

2001-03-12 Thread Nathan Myers


On Mon, Mar 05, 2001 at 02:00:59PM -0500, Tom Lane wrote:
 [EMAIL PROTECTED] (Nathan Myers) writes:
The CRC-64 code used in the SWISS-PROT genetic database is (now) at:
  ftp://ftp.ebi.ac.uk/pub/software/swissprot/Swissknife/old/SPcrc.tar.gz
 
From the README:
 
The code in this package has been derived from the BTLib package
obtained from Christian Iseli [EMAIL PROTECTED].
From his mail:
 
The reference is: W. H. Press, S. A. Teukolsky, W. T. Vetterling, and
B. P.  Flannery, "Numerical recipes in C", 2nd ed., Cambridge University
Press.  Pages 896ff.
 
The generator polynomial is x64 + x4 + x3 + x1 + 1.
 
 Nathan (or anyone else with a copy of "Numerical recipes in C", which
 I'm embarrassed to admit I don't own), is there any indication in there
 that anyone spent any effort on choosing that particular generator
 polynomial?  As far as I can see, it violates one of the standard
 guidelines for choosing a polynomial, namely that it be a multiple of
 (x + 1) ... which in modulo-2 land is equivalent to having an even
 number of terms, which this ain't got.  See Ross Williams'
 A PAINLESS GUIDE TO CRC ERROR DETECTION ALGORITHMS, available from
 ftp://ftp.rocksoft.com/papers/crc_v3.txt among other places, which is
 by far the most thorough and readable thing I've ever seen on CRCs.
 
 I spent some time digging around the net for standard CRC64 polynomials,
 and the only thing I could find that looked like it might have been
 picked by someone who understood what they were doing is in the DLT
 (digital linear tape) standard, ECMA-182 (available from
 http://www.ecma.ch/ecma1/STAND/ECMA-182.HTM):
 
 x^64 + x^62 + x^57 + x^55 + x^54 + x^53 + x^52 + x^47 + x^46 + x^45 +
 x^40 + x^39 + x^38 + x^37 + x^35 + x^33 + x^32 + x^31 + x^29 + x^27 +
 x^24 + x^23 + x^22 + x^21 + x^19 + x^17 + x^13 + x^12 + x^10 + x^9 +
 x^7 + x^4 + x + 1

I'm sorry to have taken so long to reply.  

The polynomial chosen for SWISS-PROT turns out to be presented, in 
Numerical Recipes, just as an example of a primitive polynomial of 
that degree; no assertion is made about its desirability for error 
checking.  It is (in turn) drawn from E. J. Watson, "Mathematics of 
Computation", vol. 16, pp368-9.

Having (x + 1) as a factor guarantees to catch all errors in which
an odd number of bits have been changed.  Presumably you are then
infinitesimally less likely to catch all errors in which an even 
number of bits have been changed.

I would have posted the ECMA-182 polynomial if I had found it.  (That 
was good searching!)  One hopes that the ECMA polynomial was chosen more 
carefully than entirely at random.  High-degree codes are often chosen 
by Monte Carlo methods, by applying statistical tests to randomly-chosen 
values, because the search space is so large.

I have verified that Tom transcribed the polynomial correctly from
the PDF image.  The ECMA document doesn't say whether their polynomial
is applied "bit-reversed", but the check would be equally strong either
way.

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Re: [HACKERS] WAL SHM principles

2001-03-12 Thread Nathan Myers


Sorry for taking so long to reply...

On Wed, Mar 07, 2001 at 01:27:34PM -0800, Mikheev, Vadim wrote:
 Nathan wrote:
  It is possible to build a logging system so that you mostly don't care
  when the data blocks get written 
[after being changed, as long as they get written by an fsync];
  a particular data block on disk is 
  considered garbage until the next checkpoint, so that you 
 
 How to know if a particular data page was modified if there is no
 log record for that modification?
 (Ie how to know where is garbage? -:))

In such a scheme, any block on disk not referenced up to (and including) 
the last checkpoint is garbage, and is either blank or reflects a recent 
logged or soon-to-be-logged change.  Everything written (except in the 
log) after the checkpoint thus has to happen in blocks not otherwise 
referenced from on-disk -- except in other post-checkpoint blocks.

During recovery, the log contents get written to those pages during
startup. Blocks that actually got written before the crash are not
changed by being overwritten from the log, but that's ok. If they got
written before the corresponding log entry, too, nothing references
them, so they are considered blank.

  might as well allow the blocks to be written any time,
  even before the log entry.
 
 And what to do with index tuples pointing to unupdated heap pages
 after that?

Maybe index pages are cached in shm and copied to mmapped blocks 
after it is ok for them to be written.

What platforms does PG run on that don't have mmap()?

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

[HACKERS] doxygen PG

2001-03-10 Thread Nathan Myers


Is this page 

  http://members.fortunecity.com/nymia/postgres/dox/backend/html/

common knowledge?  It appears to be an automatically-generated
cross-reference documentation web site.  My impression is that
appropriately-marked comments in the code get extracted to the 
web pages, too, so it is also a way to automate internal 
documentation.

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [HACKERS] doxygen PG

2001-03-10 Thread Nathan Myers


On Sat, Mar 10, 2001 at 06:29:37PM -0500, Tom Lane wrote:
 [EMAIL PROTECTED] (Nathan Myers) writes:
  Is this page 
http://members.fortunecity.com/nymia/postgres/dox/backend/html/
  common knowledge?
 
 Interesting, but bizarrely incomplete.  (Yeah, we have only ~100
 struct types ... sure ...)

It does say "version 0.0.1".  

What was interesting to me is that the interface seems a lot more 
helpful than the current CVS web gateway.  If it were to be completed, 
and could be kept up to date automatically, something like it could 
be very useful.

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

Re: [HACKERS] Internationalized error messages

2001-03-09 Thread Nathan Myers


On Fri, Mar 09, 2001 at 12:05:22PM -0500, Tom Lane wrote:
  Gettext takes care of this.  In the source you'd write
 
  elog(ERROR, "2200G", gettext("type mismatch in CASE expression (%s vs %s)"),
  string, string);
 
 Duh.  For some reason I was envisioning the localization substitution as
 occurring on the client side, but of course we'd want to do it on the
 server side, and before parameters are substituted into the message.
 Sorry for the noise.
 
 I am not sure we can/should use gettext (possible license problems?),
 but certainly something like this could be cooked up.

I've been assuming that PG's needs are specialized enough that the
project wouldn't use gettext directly, but instead something inspired 
by it.  

If you look at my last posting on the subject, by the way, you will see 
that it could work without a catalog underneath; integrating a catalog 
would just require changes in a header file (and the programs to generate 
the catalog, of course).  That quality seems to me essential to allow the 
changeover to be phased in gradually, and to allow different underlying 
catalog implementations to be tried out.

Nathan
ncm

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Re: [HACKERS] Use SIGQUIT instead of SIGUSR1?

2001-03-08 Thread Nathan Myers


On Thu, Mar 08, 2001 at 04:06:16PM -0500, Tom Lane wrote:
 To implement the idea of performing a checkpoint after every so many
 XLOG megabytes (as well as after every so many seconds), I need to pick
 an additional signal number for the postmaster to accept.  Seems like
 the most appropriate choice for this is SIGUSR1, which isn't currently
 being used at the postmaster level.
 
 However, if I just do that, then SIGUSR1 and SIGQUIT will have
 completely different meanings for the postmaster and for the backends,
 in fact SIGQUIT to the postmaster means send SIGUSR1 to the backends.
 This seems hopelessly confusing.
 
 I think it'd be a good idea to change the code so that SIGQUIT is the
 per-backend quickdie() signal, not SIGUSR1, to bring the postmaster and
 backend signals back into some semblance of agreement.
 
 For the moment we could leave the backends also accepting SIGUSR1 as
 quickdie, just in case someone out there is in the habit of sending
 that signal manually to individual backends.  Eventually backend SIGUSR1
 might be reassigned to mean something else.  (I suspect Bruce is
 coveting it already ;-).)

The number and variety of signals used in PG is already terrifying.

Attaching a specific meaning to SIGQUIT may be dangerous if the OS and 
its daemons also send SIGQUIT to mean something subtly different.  I'd 
rather see a reduction in the use of signals, and a movement toward more 
modern, better behaved interprocess communication mechanisms.  Still, 
"if it were done when 'tis done, then 'twere well It were done" cleanly.

--
Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl

Re: [HACKERS] Proposed WAL changes

2001-03-07 Thread Nathan Myers


On Wed, Mar 07, 2001 at 11:09:25AM -0500, Tom Lane wrote:
 "Vadim Mikheev" [EMAIL PROTECTED] writes:
  * Store two past checkpoint locations, not just one, in pg_control.
  On startup, we fall back to the older checkpoint if the newer one
  is unreadable.  Also, a physical copy of the newest checkpoint record
 
  And what to do if older one is unreadable too?
  (Isn't it like using 2 x CRC32 instead of CRC64 ? -:))
 
 Then you lose --- but two checkpoints gives you twice the chance of
 recovery (probably more, actually, since it's much more likely that
 the previous checkpoint will have reached disk safely).

Actually far more: if the checkpoints are minutes apart, even the 
worst disk drive will certainly have flushed any blocks written for 
the earlier checkpoint.

--
Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl

Re: [HACKERS] WAL SHM principles

2001-03-07 Thread Nathan Myers


On Wed, Mar 07, 2001 at 11:21:37AM -0500, Tom Lane wrote:
 Bruce Momjian [EMAIL PROTECTED] writes:
  The only problem is that we would no longer have control over which
  pages made it to disk.  The OS would perhaps write pages as we modified
  them.  Not sure how important that is.
 
 Unfortunately, this alone is a *fatal* objection.  See nearby
 discussions about WAL behavior: we must be able to control the relative
 timing of WAL write/flush and data page writes.

Not so fast!

It is possible to build a logging system so that you mostly don't care
when the data blocks get written; a particular data block on disk is 
considered garbage until the next checkpoint, so that you might as well 
allow the blocks to be written any time, even before the log entry.

Letting the OS manage sharing of disk block images via mmap should be 
an enormous win vs. a fixed shm and manual scheduling by PG.  If that
requires changes in the logging protocol, it's worth it.

(What supported platforms don't have mmap?)

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] Proposed WAL changes

2001-03-07 Thread Nathan Myers


On Wed, Mar 07, 2001 at 12:03:41PM -0800, Mikheev, Vadim wrote:
 Ian wrote:
   I feel that the fact that
   
   WAL can't help in the event of disk errors
   
   is often overlooked.
  
  This is true in general.  But, nevertheless, WAL can be written to
  protect against predictable disk errors, when possible.  Failing to
  write a couple of disk blocks when the system crashes 

or, more likely, when power drops; a system crash shouldn't keep the
disk from draining its buffers ...

  is a reasonably predictable disk error.  WAL should ideally be 
  written to work correctly in that situation.
 
 But what can be done if fsync returns before pages flushed?

Just what Tom has done: preserve a little more history.  If it's not
too expensive, then it doesn't hurt you when running on sound hardware,
but it offers a good chance of preventing embarrassments for (the 
overwhelming fraction of) users on garbage hardware.

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

Re: [HACKERS] Red Hat bashing

2001-03-06 Thread Nathan Myers


On Tue, Mar 06, 2001 at 04:20:13PM -0500, Lamar Owen wrote:
 Nathan Myers wrote:
  That is why there is no problem with version skew in the syscall
  argument structures on a correctly-configured Linux system.  (On a
  Red Hat system it is very easy to get them out of sync, but RH fans
  are used to problems.)
 
 Is RedHat bashing really necessary here? 

I recognize that my last seven words above contributed nothing.
In the future I will only post strictly factual statements about
Red Hat and similarly charged topics, and keep the opinions to
myself.  I value the collegiality of this list too much to risk 
it further.  I offer my apologies for violating it.

By the way... do they call Red Hat "RedHat" at Red Hat? 

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

Re: [HACKERS] How to shoot yourself in the foot: kill -9 postmaster

2001-03-05 Thread Nathan Myers


On Mon, Mar 05, 2001 at 08:55:41PM -0500, Tom Lane wrote:
 Bruce Momjian [EMAIL PROTECTED] writes:
  killproc should send a kill -15 to the process, wait a few seconds for
  it to exit.  If it does not, try kill -1, and if that doesn't kill it,
  then kill -9.
 
 Tell it to the Linux people ... this is their boot-script code we're
 talking about.

Not to be a zealot, but this isn't _Linux_ boot-script code, it's
_Red Hat_ boot-script code.  Red Hat would like for us all to confuse
the two, but they jes' ain't the same.  (As a rule of thumb, where it
works right, credit Linux; where it doesn't, blame Red Hat. :-)

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] WAL RC1 status

2001-03-02 Thread Nathan Myers


On Fri, Mar 02, 2001 at 10:54:04AM -0500, Bruce Momjian wrote:
  Bruce Momjian [EMAIL PROTECTED] writes:
   Is there a version number in the WAL file?
  
  catversion.h will do fine, no?
  
   Can we put conditional code in there to create
   new log file records with an updated format?
  
  The WAL stuff is *far* too complex already.  I've spent a week studying
  it and I only partially understand it.  I will not consent to trying to
  support multiple log file formats concurrently.
 
 Well, I was thinking a few things.  Right now, if we update the
 catversion.h, we will require a dump/reload.  If we can update just the
 WAL version stamp, that will allow us to fix WAL format problems without
 requiring people to dump/reload.  I can imagine this would be valuable
 if we find we need to make changes in 7.1.1, where we can not require
 dump/reload.

It Seems to Me that after an orderly shutdown, the WAL files should be, 
effectively, slag -- they should contain no deltas from the current 
table contents.  In practice that means the only part of the format that 
*should* matter is whatever it takes to discover that they really are 
slag.

That *should* mean that, at worst, a change to the WAL file format should 
only require doing an orderly shutdown, and then (perhaps) running a simple
program to generate a new-format empty WAL.  It ought not to require an 
initdb.  

Of course the details of the current implementation may interfere with
that ideal, but it seems a worthy goal for the next beta, if it's not
possible already.  Given the opportunity to change the current WAL format, 
it ought to be possible to avoid even needing to run a program to generate 
an empty WAL.

Nathan Myers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] Uh, this is not a 64-bit CRC ...

2001-02-28 Thread Nathan Myers


On Wed, Feb 28, 2001 at 04:53:09PM -0500, Tom Lane wrote:
 I just took a close look at the COMP_CRC64 macro in xlog.c.
 
 This isn't a 64-bit CRC.  It's two independent 32-bit CRCs, one done
 on just the odd-numbered bytes and one on just the even-numbered bytes
 of the datastream.  That's hardly any stronger than a single 32-bit CRC;
 it's certainly not what I thought we had agreed to implement.
 
 We can't change this algorithm without forcing an initdb, which would be
 a rather unpleasant thing to do at this late stage of the release cycle.
 But I'm not happy with it.  Comments?

This might be a good time to update:

  The CRC-64 code used in the SWISS-PROT genetic database is (now) at:

ftp://ftp.ebi.ac.uk/pub/software/swissprot/Swissknife/old/SPcrc.tar.gz

  From the README:

  The code in this package has been derived from the BTLib package
  obtained from Christian Iseli [EMAIL PROTECTED].
  From his mail:

  The reference is: W. H. Press, S. A. Teukolsky, W. T. Vetterling, and
  B. P.  Flannery, "Numerical recipes in C", 2nd ed., Cambridge University
  Press.  Pages 896ff.

  The generator polynomial is x64 + x4 + x3 + x1 + 1.

I would suggest that if you don't change the algorithm, at least change
the name in the sources.  Were you to #ifdef in a real crc-64, and make 
a compile-time option to select the old one, you could allow users who 
wish to avoid the initdb a way to continue with the existing pair of 
CRC-32s.

Nathan Myers
[EMAIL PROTECTED]

Re: [HACKERS] Uh, this is not a 64-bit CRC ...

2001-02-28 Thread Nathan Myers


On Wed, Feb 28, 2001 at 09:17:19PM -0500, Bruce Momjian wrote:
  On Wed, Feb 28, 2001 at 04:53:09PM -0500, Tom Lane wrote:
   I just took a close look at the COMP_CRC64 macro in xlog.c.
   
   This isn't a 64-bit CRC.  It's two independent 32-bit CRCs, one done
   on just the odd-numbered bytes and one on just the even-numbered bytes
   of the datastream.  That's hardly any stronger than a single 32-bit CRC;
   it's certainly not what I thought we had agreed to implement.
   
   We can't change this algorithm without forcing an initdb, which would be
   a rather unpleasant thing to do at this late stage of the release cycle.
   But I'm not happy with it.  Comments?
  
  This might be a good time to update:
  
The CRC-64 code used in the SWISS-PROT genetic database is (now) at:
  
  ftp://ftp.ebi.ac.uk/pub/software/swissprot/Swissknife/old/SPcrc.tar.gz
  
From the README:
  
The code in this package has been derived from the BTLib package
obtained from Christian Iseli [EMAIL PROTECTED].
From his mail:
  
The reference is: W. H. Press, S. A. Teukolsky, W. T. Vetterling, and
B. P.  Flannery, "Numerical recipes in C", 2nd ed., Cambridge University
Press.  Pages 896ff.
  
The generator polynomial is x64 + x4 + x3 + x1 + 1.
  
  I would suggest that if you don't change the algorithm, at least change
  the name in the sources.  Were you to #ifdef in a real crc-64, and make 
  a compile-time option to select the old one, you could allow users who 
  wish to avoid the initdb a way to continue with the existing pair of 
  CRC-32s.

 Added to TODO:
 
   * Correct CRC WAL code to be normal CRC32 algorithm 

Um, how about

  * Correct CRC WAL code to be a real CRC64 algorithm

instead?

Nathan Myers
[EMAIL PROTECTED]

Re: [HACKERS] CommitDelay performance improvement

2001-02-25 Thread Nathan Myers


On Sun, Feb 25, 2001 at 12:41:28AM -0500, Tom Lane wrote:
 Attached are graphs from more thorough runs of pgbench with a commit
 delay that occurs only when at least N other backends are running active
 transactions. ...
 It's not entirely clear what set of parameters is best, but it is
 absolutely clear that a flat zero-commit-delay policy is NOT best.
 
 The test conditions are postmaster options -N 100 -B 1024, pgbench scale
 factor 10, pgbench -t (transactions per client) 100.  (Hence the results
 for a single client rely on only 100 transactions, and are pretty noisy.
 The noise level should decrease as the number of clients increases.)

It's hard to interpret these results.  In particular, "delay 10k, sibs 20"
(10k,20), or cyan-triangle, is almost the same as "delay 50k, sibs 1" 
(50k,1), or green X.  Those are pretty different parameters to get such
similar results.

The only really bad performers were (0), (10k,1), (100k,20).  The best
were (30k,1) and (30k,10), although (30k,5) also did well except at 40.
Why would 30k be a magic delay, regardless of siblings?  What happened
at 40?

At low loads, it seems (100k,1) (brown +) did best by far, which seems
very odd.  Even more odd, it did pretty well at very high loads but had 
problems at intermediate loads.  

Nathan Myers
[EMAIL PROTECTED]

Re: [HACKERS] CommitDelay performance improvement

2001-02-24 Thread Nathan Myers


On Sat, Feb 24, 2001 at 01:07:17AM -0500, Tom Lane wrote:
 [EMAIL PROTECTED] (Nathan Myers) writes:
  I see, I had it backwards: N=0 corresponds to "always delay", and 
  N=infinity (~0) is "never delay", or what you call zero delay.  N=1 is 
  not interesting.  N=M/2 or N=sqrt(M) or N=log(M) might be interesting, 
  where M is the number of backends, or the number of backends with begun 
  transactions, or something.  N=10 would be conservative (and maybe 
  pointless) just because it would hardly ever trigger a delay.
 
 Why is N=1 not interesting?  That requires at least one other backend
 to be in a transaction before you'll delay.  That would seem to be
 the minimum useful value --- N=0 (always delay) seems clearly to be
 too stupid to be useful.

N=1 seems arbitrarily aggressive.  It assumes any open transaction will 
commit within a few milliseconds; otherwise the delay is wasted.  On a 
fairly busy system, it seems to me to impose a strict upper limit on 
transaction rate for any client, regardless of actual system I/O load.  
(N=0 would impose that strict upper limit even for a single client.)

Delaying isn't free, because it means that the client can't turn around 
and do even a cheap query for a while.  In a sense, when you delay you are 
charging the committer a tax to try to improve overall throughput.  If the 
delay lets you reduce I/O churn enough to increase the total bandwidth, 
then it was worthwhile; if not, you just cut system performance, and 
responsiveness to each client, for nothing.

The above suggests that maybe N should depend on recent disk I/O activity,
so you get a larger N (and thus less likely delay and more certain payoff) 
for a more lightly-loaded system.  On a system that has maxed its I/O 
bandwidth, clients will suffer delays anyhow, so they might as well 
suffer controlled delays that result in better total throughput.  On a 
lightly-loaded system there's no need, or payoff, for such throttling.

Can we measure disk system load by averaging the times taken for fsyncs?

Nathan Myers
[EMAIL PROTECTED]

Re: [HACKERS] regression test form

2001-02-23 Thread Nathan Myers


On Fri, Feb 23, 2001 at 03:53:14PM -0500, Vince Vielhaber wrote:
 
 Yes there are some extra linuxes, if noone comes up with another distro
 I'll lop the extras off.  BTW, is VA Linux a distribution or just a tool
 company??

Debian is a pretty important Linux distibution, probably second only
to Red Hat in number of installations.  PG is packaged for it by 
Oliver Elphick, who is on this list.  Debian is currently supported 
on x86, SPARC, PowerPC, M68K, ARM, and Alpha architectures.

VA Linux is a hardware vendor.  They ship with any of Red Hat, Debian, 
or Suse distributions installed, per customer preference.

Nathan Myers
[EMAIL PROTECTED]

Re: [HACKERS] CommitDelay performance improvement

2001-02-23 Thread Nathan Myers


On Fri, Feb 23, 2001 at 11:32:21AM -0500, Tom Lane wrote:
 A further refinement, still quite cheap to implement since the info is
 in the PROC struct, would be to not count backends that are blocked
 waiting for locks.  These guys are less likely to be ready to commit
 in the next few milliseconds than the guys who are actively running;
 indeed they cannot commit until someone else has committed/aborted to
 release the lock they need.
 
 Comments?  What should the threshold N be ... or do we need to make
 that a tunable parameter?

Once you make it tuneable, you're stuck with it.  You can always add
a knob later, after somebody discovers a real need.

Nathan Myers
[EMAIL PROTECTED]

Re: [HACKERS] CommitDelay performance improvement

2001-02-23 Thread Nathan Myers


On Fri, Feb 23, 2001 at 05:18:19PM -0500, Tom Lane wrote:
 [EMAIL PROTECTED] (Nathan Myers) writes:
  Comments?  What should the threshold N be ... or do we need to make
  that a tunable parameter?
 
  Once you make it tuneable, you're stuck with it.  You can always add
  a knob later, after somebody discovers a real need.
 
 If we had a good idea what the default level should be, I'd be willing
 to go without a knob.  I'm thinking of a default of about 5 (ie, at
 least 5 other active backends to trigger a commit delay) ... but I'm not
 so confident of that that I think it needn't be tunable.  It's really
 dependent on your average and peak transaction lengths, and that's
 going to vary across installations, so unless we want to try to make it
 self-adjusting, a knob seems like a good idea.
 
 A self-adjusting delay might well be a great idea, BTW, but I'm trying
 to be conservative about how much complexity we should add right now.

When thinking about tuning N, I like to consider what are the interesting 
possible values for N:

  0: Ignore any other potential committers.
  1: The minimum possible responsiveness to other committers.
  5: Tom's guess for what might be a good choice.
  10: Harry's guess.
  ~0: Always delay.

I would rather release with N=1 than with 0, because it actually responds 
to conditions.  What N might best be, 1, probably varies on a lot of 
hard-to-guess parameters.

It seems to me that comparing various choices (and other, more interesting,
algorithms) to the N=1 case would be more productive than comparing them 
to the N=0 case, so releasing at N=1 would yield better statistics for 
actually tuning in 7.2.

Nathan Myers
[EMAIL PROTECTED]

Re: [HACKERS] CommitDelay performance improvement

2001-02-23 Thread Nathan Myers


On Fri, Feb 23, 2001 at 06:37:06PM -0500, Bruce Momjian wrote:
  When thinking about tuning N, I like to consider what are the interesting 
  possible values for N:
  
0: Ignore any other potential committers.
1: The minimum possible responsiveness to other committers.
5: Tom's guess for what might be a good choice.
10: Harry's guess.
~0: Always delay.
  
  I would rather release with N=1 than with 0, because it actually
  responds to conditions. What N might best be, 1, probably varies on
  a lot of hard-to-guess parameters.
 
  It seems to me that comparing various choices (and other, more
  interesting, algorithms) to the N=1 case would be more productive
  than comparing them to the N=0 case, so releasing at N=1 would yield
  better statistics for actually tuning in 7.2.

 We don't release code because it has better tuning opportunities for
 later releases. What we can do is give people parameters where the
 default is safe, and they can play and report to us.

Perhaps I misunderstood.  I had perceived N=1 as a conservative choice
that was nevertheless preferable to N=0.

Nathan Myers
[EMAIL PROTECTED]

Re: [HACKERS] CommitDelay performance improvement

2001-02-23 Thread Nathan Myers


On Fri, Feb 23, 2001 at 09:05:20PM -0500, Bruce Momjian wrote:
It seems to me that comparing various choices (and other, more
interesting, algorithms) to the N=1 case would be more productive
than comparing them to the N=0 case, so releasing at N=1 would yield
better statistics for actually tuning in 7.2.
  
   We don't release code because it has better tuning opportunities for
   later releases. What we can do is give people parameters where the
   default is safe, and they can play and report to us.
  
  Perhaps I misunderstood.  I had perceived N=1 as a conservative choice
  that was nevertheless preferable to N=0.
 
 I think zero delay is the conservative choice at this point, unless we
 hear otherwise from testers.

I see, I had it backwards: N=0 corresponds to "always delay", and 
N=infinity (~0) is "never delay", or what you call zero delay.  N=1 is 
not interesting.  N=M/2 or N=sqrt(M) or N=log(M) might be interesting, 
where M is the number of backends, or the number of backends with begun 
transactions, or something.  N=10 would be conservative (and maybe 
pointless) just because it would hardly ever trigger a delay.

Nathan Myers
[EMAIL PROTECTED]

Re: [HACKERS] GPL, readline, and static/dynamic linking

2001-02-22 Thread Nathan Myers


On Thu, Feb 22, 2001 at 10:50:17AM -0500, Bruce Momjian wrote:
 Let me add I don't agree with this, and find the whole GPL
 heavy-handedness very distasteful.

Please, not this again.  Is there a piss-and-moan-about-the-GPL 
schedule posted somewhere?  

Either PG is in compliance, or it's not.  Only libreadline's copyright 
holder has the right to complain if it's not.  There is no need to 
speculate; if we care about compliance, we need only ask the owner.  
If the owner says we're violating his license, then we can comply, or
negotiate, or stop using the code.  The GPL is no different from any 
other license, that way.

Complaining about the terms on something you got for nothing has to be 
the biggest waste of time and attention I've seen on this list.

Nathan Myers
[EMAIL PROTECTED]

Re: [HACKERS] Re: WAL and commit_delay

2001-02-18 Thread Nathan Myers


On Sun, Feb 18, 2001 at 11:51:50AM -0500, Tom Lane wrote:
 Adriaan Joubert [EMAIL PROTECTED] writes:
  fdatasync() is available on Tru64 and according to the man-page behaves
  as Tom expects. So it should be a win for us.
 
 Careful ... HPUX's man page also claims that fdatasync does something
 useful, but it doesn't.  I'd recommend an experiment.  Does today's
 snapshot run any faster for you (without -F) than before?

It's worth noting in documentation that systems that don't have 
fdatasync(), or that have the phony implementation, can get the same 
benefit by using a raw volume (partition) for the log file.  This 
applies even on Linux 2.0 and 2.2 without the "raw-i/o" patch.  Using 
raw volumes would have other performance benefits, even on systems 
that do fully support fdatasync, through bypassing the buffer cache.

(The above assumes I understood correctly Vadim's postings about
changes he made to support putting logs on raw volumes.)

Nathan Myers
[EMAIL PROTECTED]

Re: [HACKERS] WAL and commit_delay

2001-02-17 Thread Nathan Myers


On Sat, Feb 17, 2001 at 03:45:30PM -0500, Bruce Momjian wrote:
  Right now the WAL preallocation code (XLogFileInit) is not good enough
  because it does lseek to the 16MB position and then writes 1 byte there.
  On an implementation that supports holes in files (which is most Unixen)
  that doesn't cause physical allocation of the intervening space.  We'd
  have to actually write zeroes into all 16MB to ensure the space is
  allocated ... but that's just a couple more lines of code.
 
 Are OS's smart enough to not allocate zero-written blocks?  

No, but some disks are.  Writing zeroes is a bit faster on smart disks.
This has no real implications for PG, but it is one of the reasons that 
writing zeroes doesn't really wipe a disk, for forensic purposes.

Nathan Myers
[EMAIL PROTECTED]

Re: [HACKERS] Microsecond sleeps with select()

2001-02-17 Thread Nathan Myers


On Sat, Feb 17, 2001 at 12:26:31PM -0500, Tom Lane wrote:
 Bruce Momjian [EMAIL PROTECTED] writes:
  A comment on microsecond delays using select().  Most Unix kernels run
  at 100hz, meaning that they have a programmable timer that interrupts
  the CPU every 10 milliseconds.
 
 Right --- this probably also explains my observation that some kernels
 seem to add an extra 10msec to the requested sleep time.  Actually
 they're interpreting a one-clock-tick select() delay as "wait till
 the next clock tick, plus one tick".  The actual delay will be between
 one and two ticks depending on just when you went to sleep.
 ...
 In short: s_spincycle in its current form does not do anything anywhere
 near what the author thought it would.  It's wasted complexity.
 
 I am thinking about simplifying s_lock_sleep down to simple
 wait-one-tick-on-every-call logic.  An alternative is to keep
 s_spincycle, but populate it with, say, 1, 2 and larger entries,
 which would offer some hope of actual random-backoff behavior.
 Either change would clearly be a win on single-CPU machines, and I doubt
 it would hurt on multi-CPU machines.
 
 Comments?

I don't believe that most kernels schedule only on clock ticks.
They schedule on a clock tick *or* whenever the process yields, 
which on a loaded system may be much more frequently.

The question is whether, scheduling, the kernel considers processes
that have requested to sleep less than a clock tick as "ready" once
their actual request time expires.  On V7 Unix, the answer was no, 
because the kernel had no way to measure any time shorter than a
tick, so it rounded up all sleeps to "the next tick".

Certainly there are machines and kernels that count time more precisely 
(isn't PG ported to QNX?).  We do users of such kernels no favors by 
pretending they only count clock ticks.  Furthermore, a 1ms clock
tick is pretty common, e.g. on Alpha boxes.  A 10ms initial delay is 
ten clock ticks, far longer than seems appropriate.

This argues for yielding the minimum discernable amount of time (1us)
and then backing off to a less-minimal time (1ms).  On systems that 
chug at 10ms, this is equivalent to a sleep of up-to-10ms (i.e. until 
the next tick), then a sequence of 10ms sleeps; on dumbOS Alphas, it's 
equivalent to a sequence of 1ms sleeps; and on a smartOS on an Alpha it's 
equivalent to a short, variable time (long enough for other runnable 
processes to run and yield) followed by a sequence of 1ms sleeps.  
(Some of the numbers above are doubled on really dumb kernels, as
Tom noted.)

Nathan Myers
[EMAIL PROTECTED]

Re: [HACKERS] Re: WAL and commit_delay

2001-02-17 Thread Nathan Myers


On Sat, Feb 17, 2001 at 06:30:12PM -0500, Brent Verner wrote:
 On 17 Feb 2001 at 17:56 (-0500), Tom Lane wrote:
 
 [snipped]
 
 | Is anyone out there running a 2.4 Linux kernel?  Would you try pgbench
 | with current sources, commit_delay=0, -B at least 1024, no -F, and see
 | how the results change when pg_fsync is made to call fdatasync instead
 | of fsync?  (It's in src/backend/storage/file/fd.c)
 
 I've not run this requested test, but glibc-2.2 provides this bit
 of code for fdatasync, so it /appears/ to me that kernel version
 will not affect the test case.
 
 [glibc-2.2/sysdeps/generic/fdatasync.c]
 
   int
   fdatasync (int fildes)
   {
   return fsync (fildes);
   }

In the 2.4 kernel it says (fs/buffer.c)

   /* this needs further work, at the moment it is identical to fsync() */
   down(inode-i_sem);
   err = file-f_op-fsync(file, dentry);
   up(inode-i_sem);

We can probably expect this to be fixed in an upcoming 2.4.x, i.e.
well before 2.6.

This is moot, though, if you're writing to a raw volume, which
you will be if you are really serious.  Then, fsync really is 
equivalent to fdatasync.

Nathan Myers
[EMAIL PROTECTED]

Re: [HACKERS] locale support

2001-02-12 Thread Nathan Myers


On Mon, Feb 12, 2001 at 09:59:37PM -0500, Tom Lane wrote:
 Tatsuo Ishii [EMAIL PROTECTED] writes:
  I know this is not PostgreSQL's fault but the broken locale data on
  certain platforms. The problem makes it impossible to use PostgreSQL
  RPMs in Japan.
 
  I'm looking for solutions/workarounds for this problem.
 
 Build a set of RPMs without locale support?

Run it with LC_ALL="C".

Nathan Myers
[EMAIL PROTECTED]

Re: [HACKERS] Syslog and pg_options (for RPMs)

2001-02-09 Thread Nathan Myers


On Thu, Feb 08, 2001 at 11:36:38PM -0500, Vince Vielhaber wrote:
 On 8 Feb 2001, Ian Lance Taylor wrote:
 
  Unfortunately, the license [to splogger] probably precludes
  including it with Postgres.  Fortunately, it's only 72 lines long, and
  would be trivial to recreate.
 
 I missed most of this, but has anyone actually ASKED Dan for permission?

What's the point?  I've attached an independent implementation.
It recognizes tags for all seven levels.  It needs no command-line 
arguments.  Untagged messages end up logged as "LOG_NOTICE".  
Use it freely.

Nathan Myers
[EMAIL PROTECTED]

--
/* pglogger: stdin-to-syslog gateway for postgresql.
 *
 * Copyright 2001 by Nathan Myers [EMAIL PROTECTED]
 * Permission is granted to make copies for any purpose if
 * this copyright notice is retained unchanged.
*/

#include stdio.h
#include stddef.h
#include syslog.h
#include string.h

char* levels[] =
{
"", "emerg:", "alert:", "crit:", "err:",
"warning:", "notice:", "info:", "debug:" 
};

int lengths[] = 
{
0, sizeof("emerg"), sizeof("alert"), sizeof("crit"), sizeof("err"),
sizeof("warning"), sizeof("notice"), sizeof("info"), sizeof("debug")
};

int priorities[] = 
{
  LOG_NOTICE, LOG_EMERG, LOG_ALERT, LOG_CRIT, LOG_ERR, 
  LOG_WARNING, LOG_NOTICE, LOG_INFO, LOG_DEBUG
};

int main()
{
char buf[301];
int c;
char* pos = buf;
int colon = 0;

#ifndef DEBUG
openlog("postgresql", LOG_CONS, LOG_LOCAL1);
#endif
while ( (c = getchar()) != EOF) {
if (c == '\r') {
  continue;
}
if (c == '\n') {
int level = (colon ? sizeof(levels)/sizeof(*levels) : 1);
char* bol;

*pos = 0;
while (--level) {
if (pos - buf = lengths[level] 
 strncmp(buf, levels[level], lengths[level]) == 0) {
break; 
}
}
bol = buf + lengths[level];
if (bol  buf  *bol == ' ') {
++bol;
}
if (pos - bol  0) {
#ifndef DEBUG
syslog(priorities[level], "%s", bol);
#else
printf("%d/%s\n", priorities[level], bol);
#endif
}
pos = buf;
colon = 0;
continue;
}
if (c == ':') {
colon = 1;
}
if ((size_t)(pos - buf)  sizeof(buf)-1) {
*pos++ = c;
}
}
return 0;
}

Re: [HACKERS] Btree runtime recovery. Stuck spins.

2001-02-09 Thread Nathan Myers


On Fri, Feb 09, 2001 at 01:23:35PM -0500, Tom Lane wrote:
 Bruce Momjian [EMAIL PROTECTED] writes:
  Our spinlocks don't go into an infinite test loop, right?  They back off
  and retest at random intervals.
 
 Not very random --- either 0 or 10 milliseconds.  (I think there was
 some discussion of changing that, but it died off without agreeing on
 anything.) ...

I think we agreed that 0 was just wrong, but nobody changed it.
Changing it to 1 microsecond would be the smallest reasonable 
change.  As it is, it just does a bunch of no-op syscalls each time it
wakes up after a 10ms sleep, without yielding the CPU.

Nathan Myers
[EMAIL PROTECTED]

Re: [HACKERS] Syslog and pg_options (for RPMs)

2001-02-09 Thread Nathan Myers


Here's the latest version of the pg_logger utility.  
The particular questions that come to my mind are:

1. Do the prefixes it watches for match what PG produces?
2. Should it log to LOG_LOCAL1 or to some other LOG_LOCALn?
3. Is the ident string ("postgresql") right?
4. Are the openlog() args right?  (E.g. should it ask for LOG_PID too?)
5. What am I failing to ask about?

I'd like to turn it over to whoever can answer those questions.

Nathan Myers
[EMAIL PROTECTED]

-
/* pg_logger: stdin-to-syslog gateway for postgresql.
 *
 * Copyright 2001 by Nathan Myers [EMAIL PROTECTED]
 * This software is distributed free of charge with no warranty of any kind.
 * You have permission to make copies for any purpose, provided that (1) 
 * this copyright notice is retained unchanged, and (2) you agree to 
 * absolve the author of all responsibility for all consequences arising 
 * from any use.  
 */

#include stdio.h
#include stddef.h
#include syslog.h
#include string.h

struct {
char *name;
int size;
int priority;
} tags[] = {
{ "", 0, LOG_NOTICE },
{ "emerg:",   sizeof("emerg"),   LOG_EMERG },
{ "alert:",   sizeof("alert"),   LOG_ALERT },
{ "crit:",sizeof("crit"),LOG_CRIT },
{ "err:", sizeof("err"), LOG_ERR },
{ "error:",   sizeof("error"),   LOG_ERR },
{ "warning:", sizeof("warning"), LOG_WARNING },
{ "notice:",  sizeof("notice"),  LOG_NOTICE },
{ "info:",sizeof("info"),LOG_INFO },
{ "debug:",   sizeof("debug"),   LOG_DEBUG }
};

int main()
{
char buf[301];
int c;
char *pos = buf;
const char *colon = 0;

#ifndef DEBUG
openlog("postgresql", LOG_CONS, LOG_LOCAL1);
#endif
while ( (c = getchar()) != EOF) {
if (c == '\r') {
  continue;
}
if (c == '\n') {
int level = sizeof(tags)/sizeof(*tags);
char *bol;

if (colon == 0 || (size_t)(colon - buf)  sizeof("warning")) {
level = 1;
}
*pos = 0;
while (--level) {
if (pos - buf = tags[level].size
 strncmp(buf, tags[level].name, tags[level].size) == 0) {
break; 
}
}
bol = buf + tags[level].size;
if (bol  buf  *bol == ' ') {
++bol;
}
if (pos - bol  0) {
#ifndef DEBUG
syslog(tags[level].priority, "%s", bol);
#else
printf("%d/%s\n", tags[level].priority, bol);
#endif
}
pos = buf;
colon = (char const *)0;
continue;
}
if (c == ':'  !colon) {
colon = pos;
}
if ((size_t)(pos - buf)  sizeof(buf)-1) {
*pos++ = c;
}
}
return 0;
}

Re: [HACKERS] Syslog and pg_options (for RPMs)

2001-02-08 Thread Nathan Myers


On Thu, Feb 08, 2001 at 04:00:12PM -0500, Lamar Owen wrote:
 "Dominic J. Eidson" wrote:
  On Thu, 8 Feb 2001, Lamar Owen wrote:
   A syslogger of stderr would make a nice place to pipe the output :-).
   'postmaster  21 | output-to-syslog-program -f facility.desired' or
 
  21 | logger -p facility.level
 [snip]
   Logger provides a shell command interface to the syslog(3) system log
   module.
 
 Good. POSIX required, and part of the base system (basically, guaranteed
 to be there on any Linux box).  Thanks for the pointer.

Not so fast... logger just writes its arguments to syslog.  I don't
see any indication that it (portably) reads its standard input.
It's meant for use in shellscripts.  You could write:

 ... 21 | while read i; do logger -p local1.warning -t 'PG ' -- "$i"; done

but syslog is pretty high-overhead already without starting up logger
on every message.  Maybe stderr messages are infrequent enough that
it doesn't matter.

Nathan Myers
[EMAIL PROTECTED]

Re: [HACKERS] PL/pgsql EXECUTE 'SELECT INTO ...'

2001-02-07 Thread Nathan Myers


On Wed, Feb 07, 2001 at 10:15:02PM -0500, Tom Lane wrote:
 I have looked a little bit at what it'd take to make SELECT INTO inside
 an EXECUTE work the same as it does in plain plpgsql  ...
 If we do nothing now, and then implement this feature in 7.2, we will
 have a backwards compatibility problem: EXECUTE 'SELECT INTO ...'
 will completely change in meaning.
 
 I am inclined to keep our options open by forbidding EXECUTE 'SELECT
 INTO ...' for now. ... if [not] I think we'll regret it later.

I agree, disable it.  But put a backpatch into contrib along with 
a reference to this last e-mail.  Anybody who cares enough can
apply the patch, and will be prepared for the incompatibility.

Nathan Myers
[EMAIL PROTECTED]

Re: [HACKERS] OID from insert has extra letter

2001-02-06 Thread Nathan Myers


On Tue, Feb 06, 2001 at 01:21:00PM -0500, Bruce Momjian wrote:
   *** fe-exec.c 2001/01/24 19:43:30 1.98
   --- fe-exec.c 2001/02/06 02:02:27 1.100
   ***
   *** 2035,2041 
 if (len  23)
 len = 23;
 strncpy(buf, res-cmdStatus + 7, len);
   ! buf[23] = '\0';
 
 return buf;
 }
   --- 2035,2041 
 if (len  23)
 len = 23;
 strncpy(buf, res-cmdStatus + 7, len);
   ! buf[len] = '\0';
 
 return buf;
 }
   
  
  Hmm, is there some undocumented feature of strncpy that I don't know
  about, where it modifies the passed length variable (which would be hard,
  since it's pass by value)? Otherwise, doesn't this patch just replace
  the constant '23' with the variable 'len', set to 23?
 
 What if len  23?

If len  23, then strncpy will have terminated the destination
already.  Poking out buf[23] just compensates for a particular
bit of brain damage in strncpy.  Read the man page:

  The strncpy() function is similar [to strcpy], except that not
  more than n bytes of src are copied. Thus, if there is no null
  byte among the first n bytes of src, the result wil not be
  null-terminated.

Thus, the original code is OK, except probably the literal "23"
in place of what should be a meaningful symbolic constant, or
(at least!) sizeof(buf) - 1.

BTW, that static buffer in PGoidStatus is likely to upset threaded 
client code...

ob-ed
To null-terminate strings is an Abomination.  
/ob-ed

Nathan Myers
[EMAIL PROTECTED]

Re: [HACKERS] using the same connection?

2001-02-06 Thread Nathan Myers


On Tue, Feb 06, 2001 at 11:08:49AM -0500, Mathieu Dube wrote:
 Hi y'all,
   Is it a bad idea for an app to keep just a couple of connections to a
 database, put semaphore/mutex on them and reuse them all through the program?
   Of course I would check if their PQstatus isnt at CONNECTION_BAD and
 reconnect if they were...

You would have to hold the lock from BEGIN until COMMIT.
Otherwise, connection re-use is normal.  

Nathan Myers
[EMAIL PROTECTED]

Re: [HACKERS] configure problem with krb4 and ssl when compiling 7.1beta4

2001-02-02 Thread Nathan Myers


On Fri, Feb 02, 2001 at 12:03:14PM +, Jun Kuwamura wrote:
   Furthermore, the newest version of PyGreSQL is 3.1 instead of 2.5.

Is this on the TODO-7.1 list?

Nathan Myers
[EMAIL PROTECTED]

Re: [HACKERS] beta3 Solaris 7 (SPARC) port report

2001-01-25 Thread Nathan Myers


On Thu, Jan 25, 2001 at 09:47:16PM +0100, Frank Joerdens wrote:
 On Thu, Jan 25, 2001 at 12:04:40PM -0800, Ian Lance Taylor wrote:
 [ . . . ]
   for the /tmp directory, which looks distinctly odd to me. What kind of
   device is swap (I know what swap is normally but I didn't know you could
   mount stuff there . . . )??
  
  That is a tmpfs file system which uses swap space for /tmp storage.
  Both swap usage and /tmp compete for the same partition on the disk.
  If you have a lot of swapping programs, you don't get to put much in
  /tmp.  If you have a lot of files in /tmp, you don't get to run many
  programs.
  
  As far as I can recall, this is a Sun specific thing.
  
  It's a reasonable idea on a stable system.  It's a pretty crummy idea
  on a development system, or one with unpredictable loads.  My
  experience is that either something goes crazy and fills up /tmp and
  then you can't run anything else and you have to reboot, or something
  goes crazy and fills up swap and then you can't write any /tmp files
  and daemon processes start to silently die and you have to reboot.
 
 Very peculiar, or crummy, indeed. This is system is not used by anyone
 else besides myself at the moment (cuz it's just being built up), as far
 a I can tell, and is ludicrously overpowered (3 CPUs, 768 MB RAM) for
 the mundane uses I am subjecting it to (installing and testing
 Postgresql).

I doubt you can blame any problems on tmpfs, here.  tmpfs has been 
in Solarix for many years, and has had plenty of time to stabilize.
With 768M of RAM and running only PG you not using any swap space at 
all, and unix sockets don't use any appreciable space either, so the 
conflicts Ian describes are impossible in your case.  

Nathan Myers
[EMAIL PROTECTED]

Re: AW: [HACKERS] like and optimization

2001-01-22 Thread Nathan Myers


On Mon, Jan 22, 2001 at 05:46:09PM -0500, Tom Lane wrote:
 Hannu Krosing [EMAIL PROTECTED] writes:
  Is there any possibility to use, in a portable way, only our own locale 
  definition files, without reimplementing all the sorts uppercases etc. ?
 
 The situation is not too much different for timezones, BTW.  Might make
 sense to deal with both of those problems in the same way.

The timezone situation is much better, in that there is a separate
organization which maintains a timezone database and code to operate
on it.  It wouldn't be necessary to include the package with PG, 
because it can be got at a standard place.  You would only need 
scripts to download, build, and integrate it.

 Are there any BSD-license locale and/or timezone libraries that we might
 assimilate in this way?  We could use an LGPL'd library if there is no
 other alternative, but I'd just as soon not open up the license issue.

Posix systems include a set of commands for dumping locales in a standard 
format, and building from them.  Instead of shipping locales and code to 
operate on them, one might include a script to run these tools (where 
they exist) to dump an existing locale, edit it a bit, and build a more 
PG-friendly locale.

Nathan Myers
[EMAIL PROTECTED]

1 2 >

1 - 100 of 146 matches

Mail list logo