Re: [HACKERS] beta testing version

2000-12-01 Thread Nathan Myers

On Fri, Dec 01, 2000 at 01:54:23AM -0500, Alex Pilosov wrote:
 On Thu, 30 Nov 2000, Nathan Myers wrote:
  After a power outage on an active database, you may have corruption
  at low levels of the system, and unless you have enormous redundancy
  (and actually use it to verify everything) the corruption may go 
  undetected and result in (subtly) wrong answers at any future time.

 Nathan, why are you so hostile against postgres? Is there an ax to grind?

Alex, please don't invent enemies.  It's clear what important features
PostgreSQL still lacks; over the next several releases these features
will be implemented, at great expense.  PostgreSQL is useful and usable
now, given reasonable precautions and expectations.  In the future it
will satisfy greater (albeit still reasonable) expectations.

 The conditions under which WAL will completely recover your database:

 1) OS guarantees complete ordering of fsync()'d writes. (i.e. having two
 blocks A and B, A is fsync'd before B, it could NOT happen that B is on
 disk but A is not).
 2) on boot recovery, OS must not corrupt anything that was fsync'd.
 
 Rule 1) is met by all unixish OSes in existance. Rule 2 is met by some
 filesystems, such as reiserfs, tux2, and softupdates. 

No.  The OS asks the disk to write blocks in a certain order, but 
disks normally reorder writes.  Not only that; as noted earlier, 
typical disks report the write completed long before the blocks 
actually hit the disk.

A logging file system protects against the simpler forms of OS crash,
where the OS data-structure corruption is noticed before any more disk
writes are scheduled.  It can't (by itself) protect against disk
errors.  For critical applications, you must supply that protection
yourself, with (e.g.) battery-backed mirroring.

  The logging in 7.1 protects transactions against many sources of 
  database crash, but not necessarily against OS crash, and certainly
  not against power failure.  (You might get lucky, or you might just 
  think you were lucky.)  This is the same as for most databases; an
  embedded database that talks directly to the hardware might be able
  to do better.  

The best possible database code can't overcome a broken OS or a broken 
disk.  It would be unreasonable to expect otherwise.

Nathan Myers
[EMAIL PROTECTED] 



[HACKERS] Bitmap index

2000-12-01 Thread pejac

Hello,

Please , excuse me for my bad english.

One question on bitmaps index. In them Commercial data bases (oracle DB2),
Let bitmap type index is supported.This index is used for fields of type sex or 
Boolean generally, would be it(he)
supported in postgres??? If not is foreseen it??? 

Best regards

PEJAC Pascal



Re: [HACKERS] beta testing version

2000-12-01 Thread Don Baccus

At 11:06 PM 11/30/00 -0800, Vadim Mikheev wrote:
 As for replaying logs against a restored snapshot dump... AIUI, a 
 dump records tuples by OID, but the WAL refers to TIDs.  Therefore, 
 the WAL won't work as a re-do log to recover your transactions 
 because the TIDs of the restored tables are all different.   

True for current way of backing up - ie saving data in "external"
(sql) format. But there is another way - saving data files in their
natural (binary) format. WAL records may be applyed to
such dump, right?

Right.  That's what's missing in PG 7.1, the existence of tools to
make such backups.  

Probably the best answer to the "what does WAL get us, if it doesn't
get us full recoverability" questions is to simply say "it's a prerequisite
to getting full recoverability, PG 7.1 sets the foundation and later
work will get us there".



- Don Baccus, Portland OR [EMAIL PROTECTED]
  Nature photos, on-line guides, Pacific Northwest
  Rare Bird Alert Service and other goodies at
  http://donb.photo.net.



Re: [HACKERS] beta testing version

2000-12-01 Thread Don Baccus

At 12:30 AM 12/1/00 -0800, Ian Lance Taylor wrote:
For example, I would hope that EMC
disk systems handle power loss gracefully.

They must, their marketing literature says so :)

  But if you buy ordinary
off the shelf PC hardware, you really do need to arrange for a UPS,
and some sort of automatic shutdown if the UPS is running low.

Which is what disk subsystems like those from EMC do for you.  They've
got build-in battery backup that lets them guarantee (assuming the
hardware's working right) that in the case of a power outage, all blocks
the operating system thinks have been written will in actuality be written
before the disk subsystem powers itself down.



- Don Baccus, Portland OR [EMAIL PROTECTED]
  Nature photos, on-line guides, Pacific Northwest
  Rare Bird Alert Service and other goodies at
  http://donb.photo.net.



Re: [HACKERS] beta testing version

2000-12-01 Thread Vadim Mikheev

   As for replaying logs against a restored snapshot dump... AIUI, a 
   dump records tuples by OID, but the WAL refers to TIDs.  Therefore, 
   the WAL won't work as a re-do log to recover your transactions 
   because the TIDs of the restored tables are all different.   
  
  True for current way of backing up - ie saving data in "external"
  (sql) format. But there is another way - saving data files in their
  natural (binary) format. WAL records may be applyed to
  such dump, right?
 
 But (AIUI) you can only safely/usefully copy those files when the 
 database is shut down.

No. You can read/save datafiles at any time. But block reads must be
"atomic" - no one should be able to change any part of a block while
we read it. Cp  tar are probably not suitable for this, but internal
BACKUP command could do this.

Restoring from such backup will like recovering after pg_ctl -m i stop: all
data blocks are consistent and WAL records may be applyed to them.

 Many people hope to run PostgreSQL 24x7x365.  With vacuuming, you 
 might just as well shut down afterward; but when that goes away 
 (in 7.2?), when will you get the chance to take your backups?  

Ability to shutdown 7.2 will be preserved -:))
But it's not required for backup.

   To get replaying we need an "update log", something that might be
   in 7.2 if somebody does a lot of work.
  
  What did you mean by "update log"?
  Are you sure that WAL is not "update log" ? -:)
 
 No, I'm not sure.  I think it's possible that a new backup utility 
 could be written to make a hot backup which could be restored and 
 then replayed using the current WAL format.  It might be easier to
 add another log which could be replayed against the existing form
 of backups.  That last is what I called the "update log".

Consistent read of data blocks is easier to implement, sure.

 The point is, WAL now does one job superbly: maintain a consistent
 on-disk database image.  Asking it to do something else, such as 
 supporting hot BAR, could interfere with it doing its main job.  
 Of course, only the person who implements hot BAR can say.

There will be no interference because of BAR will not ask WAL to do
anything else it does right now - redo-ing changes.

Vadim





Re: [HACKERS] beta testing version

2000-12-01 Thread Don Baccus

At 11:02 AM 12/1/00 -0800, Nathan Myers wrote:
On Fri, Dec 01, 2000 at 06:39:57AM -0800, Don Baccus wrote:
 
 Probably the best answer to the "what does WAL get us, if it doesn't
 get us full recoverability" questions is to simply say "it's a 
 prerequisite to getting full recoverability, PG 7.1 sets the foundation 
 and later work will get us there".

Not to quibble, but for most of us, the answer to Don's question is:
"It gives a ~20x speedup over 7.0."  That's pretty valuable to some of us.
If it turns out to be useful for other stuff, that's gravy.

Oh, but given that power failures eat disks anyway, you can just run PG 7.0
with -F and be just as fast as PG 7.1, eh?  With no theoretical loss in
safety?  Where's your faith in all that doom and gloom you've been 
spreading? :) :)

You're right, of course, we'll get roughly -F performance while maintaining
a much more comfortable level of risk than you get with -F.



- Don Baccus, Portland OR [EMAIL PROTECTED]
  Nature photos, on-line guides, Pacific Northwest
  Rare Bird Alert Service and other goodies at
  http://donb.photo.net.



[HACKERS] WAL information

2000-12-01 Thread Mitch Vincent

Ok, this has peaked my interest in learning exactly what WAL is and what it
does... I don't see any in-depth explanation of WAL on the postgresql.org
site, can someone point me to some documentation? (if any exists, that is).

Thanks!

-Mitch

- Original Message -
From: "Nathan Myers" [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Friday, December 01, 2000 11:02 AM
Subject: Re: [HACKERS] beta testing version


 On Fri, Dec 01, 2000 at 06:39:57AM -0800, Don Baccus wrote:
 
  Probably the best answer to the "what does WAL get us, if it doesn't
  get us full recoverability" questions is to simply say "it's a
  prerequisite to getting full recoverability, PG 7.1 sets the foundation
  and later work will get us there".

 Not to quibble, but for most of us, the answer to Don's question is:
 "It gives a ~20x speedup over 7.0."  That's pretty valuable to some of us.
 If it turns out to be useful for other stuff, that's gravy.

 Nathan Myers
 [EMAIL PROTECTED]





Re: [HACKERS] beta testing version

2000-12-01 Thread Nathan Myers

On Fri, Dec 01, 2000 at 09:13:28PM +1100, Philip Warner wrote:
 
 You have raised some interesting issues regrading write-order etc. Can we
 assume that when fsync *returns*, all records are written - though not
 necessarily in the order that the IO's were executed?

Not with ordinary disks.  With a battery-backed disk server, yes.

Nathan Myers
[EMAIL PROTECTED]




Re: [HACKERS] beta testing version

2000-12-01 Thread Nathan Myers

On Fri, Dec 01, 2000 at 11:48:23AM -0800, Don Baccus wrote:
 At 11:09 AM 12/1/00 -0800, Nathan Myers wrote:
 On Fri, Dec 01, 2000 at 10:01:15AM +0100, Zeugswetter Andreas SB wrote:
 
  If you need to restore from offsite backup you loose transactions
  unless you transfer the WAL synchronously with every commit. 
 
 Currently the only way to avoid losing those transactions is by 
 replicating transactions at the application layer.  That is, the
 application talks to two different database instances, and enters
 transactions into both.  That's pretty hard to retrofit into an
 existing application, so you'd really rather have replication in
 the database.  Of course, that's something PostgreSQL, Inc. is also 
 working on.
 
 Recovery alone isn't quite that difficult.  You don't need to instantiate
 your database instance until you need to apply the archived transactions,
 i.e. after catastrophic failure destroys your db server.

True, it's sufficient for the application just to log the text of 
its updating transactions off-site.  Then, to recover, instantiate 
a database from a backup and have the application re-run its 
transactions.  

 You need to do two things:

(Remember, we're talking about what you could do *now*, with 7.1.
Presumably with 7.2 other options will open.)
 
 1. Transmit a consistent (known-state) snapshot of the database offsite.

 2. Synchronously tranfer the WAL as part of every commit (question, do
wait to log a "commit" locally until after the remote site acks that
it got the WAL?)
 
 Then you take a new machine, build a database out of the snapshot, and
 apply the archived redo logs and off you go.  If you get tired of saving
 oodles of redo archives, you make a new snapshot and accumulate the
 WAL from that point forward.
 
I don't know of any way to synchronously transfer the WAL, currently.

Anyway, I would expect doing it to interfere seriously with performance.
The "wait to log a 'commit' locally until after the remote site acks that
it got the WAL" is (akin to) the familiar two-phase commit.

Nathan Myers
[EMAIL PROTECTED]



Re: [HACKERS] beta testing version

2000-12-01 Thread Don Baccus

At 12:56 PM 12/1/00 -0800, Nathan Myers wrote:

(Remember, we're talking about what you could do *now*, with 7.1.
Presumably with 7.2 other options will open.)

Maybe *you* are :)  Seriously, I'm thinking out loud about future
possibilities.  Putting a lot of work into building up a temporary
solution on top of 7.1 doesn't make a lot of sense, anyone wanting
to work on such things ought to think about 7.2, which presumably will
beta sometime mid-2001 or so???

And I don't think there are 7.1 hacks that are simple ... could be
wrong, though.

I don't know of any way to synchronously transfer the WAL, currently.

Nope.

Anyway, I would expect doing it to interfere seriously with performance.

Yep.  Anyone here have experience with replication and Oracle or others?
I've heard from one source that setting it up reliabily in Oracle and
getting the switch from the dead to the backup server working properly was
something of a DBA nightmare, but that's true of just about anything in
Oracle.  Once it was up, it worked reliably, though (also typical
of Oracle).

The "wait to log a 'commit' locally until after the remote site acks that
it got the WAL" is (akin to) the familiar two-phase commit.

Right.



- Don Baccus, Portland OR [EMAIL PROTECTED]
  Nature photos, on-line guides, Pacific Northwest
  Rare Bird Alert Service and other goodies at
  http://donb.photo.net.



[HACKERS] COPY BINARY is broken...

2000-12-01 Thread Tom Lane

I've just noticed that COPY BINARY is pretty thoroughly broken by TOAST,
because what it does is to dump out verbatim the bytes making up each
tuple of the relation.  In the case of a moved-off value, you'll get
the toast reference, which is not going to be too helpful for reloading
the table data.  In the case of a compressed-in-line datum, you'll at
least have all the data there, but the COPY BINARY reader will crash
and burn when it sees it.

Fixing this while retaining backwards compatibility with the existing
COPY BINARY file format is possible, but it seems rather a headache:
we'd need to detoast all the toasted columns, then heap_formtuple a
new tuple containing the expanded data, and finally write that out.
(Can't do it on a field-by-field basis because the file format requires
the total tuple size to precede the tuple data.)  Kind of ugly.

The existing COPY BINARY file format is entirely brain-dead anyway; for
example, it wants the number of tuples to be stored at the front, which
means we have to scan the whole relation an extra time to get that info.
Its handling of nulls is bizarre, too.  I'm thinking this might be a
good time to abandon backwards compatibility and switch to a format
that's a little easier to read and write.  Does anyone have an opinion
pro or con about that?

regards, tom lane



Re: [HACKERS] COPY BINARY is broken...

2000-12-01 Thread Tom Lane

Alfred Perlstein [EMAIL PROTECTED] writes:
 I would rip it out.

I thought about that too, but was afraid to suggest it ;-)

How many people are actually using COPY BINARY?

regards, tom lane



RE: [HACKERS] COPY BINARY is broken...

2000-12-01 Thread Mikheev, Vadim

 Alfred Perlstein [EMAIL PROTECTED] writes:
  I would rip it out.
 
 I thought about that too, but was afraid to suggest it ;-)
 
 How many people are actually using COPY BINARY?

It could be useful if only single scan would be required.
But I have no strong opinion about keeping it.

Vadim



RE: [HACKERS] COPY BINARY is broken...

2000-12-01 Thread Mikheev, Vadim

 The existing COPY BINARY file format is entirely brain-dead 
 anyway; for example, it wants the number of tuples to be stored
 at the front, which means we have to scan the whole relation an
 extra time to get that info. Its handling of nulls is bizarre, too.
 I'm thinking this might be a good time to abandon backwards
 compatibility and switch to a format that's a little easier to read
 and write.  Does anyone have an opinion pro or con about that?

Switch to new format.

Vadim



Re: [HACKERS] COPY BINARY is broken...

2000-12-01 Thread Alfred Perlstein

* Tom Lane [EMAIL PROTECTED] [001201 14:57] wrote:
 Alfred Perlstein [EMAIL PROTECTED] writes:
  I would rip it out.
 
 I thought about that too, but was afraid to suggest it ;-)

I think you'd agree that you have more fun and important things to
do than to deal with this yucky interface. :)

 How many people are actually using COPY BINARY?

I'm not using it. :)

How about adding COPY XML?











(kidding of course about the XML, but it would make postgresql more
buzzword compliant :) )

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
"I have the heart of a child; I keep it in a jar on my desk."



Re: [HACKERS] COPY BINARY is broken...

2000-12-01 Thread Don Baccus

At 03:05 PM 12/1/00 -0800, Alfred Perlstein wrote:

How about adding COPY XML?
(kidding of course about the XML, but it would make postgresql more
buzzword compliant :) )

Hey, we could add a parser and call the module MyXML ...



- Don Baccus, Portland OR [EMAIL PROTECTED]
  Nature photos, on-line guides, Pacific Northwest
  Rare Bird Alert Service and other goodies at
  http://donb.photo.net.



[HACKERS] ODBC Driver

2000-12-01 Thread Michael Fork

I am curious as to where the newest ODBC driver source is -- I retrieved
/src/interfaces/odbc from CVS, but it appeared to only be version
6.40.0009 and was lacking the Visual C++ workspace/project files that
were in the 6.50. release zip file on the FTP server.  

Thanks

Michael Fork - CCNA - MCP - A+
Network Support - Toledo Internet Access - Toledo Ohio




[HACKERS] ALTER FUNCTION problem

2000-12-01 Thread Mark Hollomon


I am working on an implementation of 'ALTER FUNCTION' and have run into a 
problem.

plpgsql. plperl and pltcl all cache the result of a compile of prosrc.
Which leads to things like:

mhh=# create function f() returns integer as 'begin return 42; end;' language 
'plpgsql';
CREATE
mhh=# select f();
 f  

 42
(1 row)

mhh=# alter function f() as 'begin return 44; end;';
ALTER
mhh=# select f();
 f  

 42
(1 row)

mhh=# select proname, prosrc from pg_proc where proname = 'f';
 proname |prosrc 
-+---
 f   | begin return 44; end;

Of course, leaving psql and re-entering fixes the problem. But the same 
problem is manifested between concurrent sessions as well.

I would like to propose that a new attribute be added to pg_proc 'proserial'. 
'CREATE FUNCTION' will set proserial to 0. 'ALTER FUNCTION' will increment it
each time. It would be up to the individual PL handlers to check to make sure 
that their cache is not out of date.

Is there a better way to solve this problem?


-- 
Mark Hollomon



Re: [HACKERS] Re: [SQL] Rules with Conditions: Bug, or Misunderstanding

2000-12-01 Thread Mark Hollomon

On Friday 01 December 2000 00:33, Tom Lane wrote:
 The rewriting is done, all right, but what's left afterward still has
 references to the view, because each rule is conditional.  Essentially,
 the rewriter output looks like

   -- rule 1
   if (rule1 condition holds)
   -- rule 2 applied to rule1 success case
   if (rule2 condition holds)
   apply rule 2's query
   else
   apply rule 1's query
   else
   -- rule 2 applied to rule1 failure case
   if (rule2 condition holds)
   apply rule 2's query
   else
   apply original query

 If the system were capable of determining that either rule1 or rule2
 condition will always hold, perhaps it could deduce that the original
 query on the view will never be applied.  However, I doubt that we
 really want to let loose an automated theorem prover on the results
 of every rewrite ...

I think it would be better to move the test further down, to just before we 
actually try to do the update/insert. Maybe into the heap access routines as 
suggested by Andreas.


-- 
Mark Hollomon



Re: [HACKERS] ALTER FUNCTION problem

2000-12-01 Thread Tom Lane

Mark Hollomon [EMAIL PROTECTED] writes:
 plpgsql. plperl and pltcl all cache the result of a compile of prosrc.

plpgsql does, but I didn't think the other two do.

 I would like to propose that a new attribute be added to pg_proc
 'proserial'.  'CREATE FUNCTION' will set proserial to 0. 'ALTER
 FUNCTION' will increment it each time. It would be up to the
 individual PL handlers to check to make sure that their cache is not
 out of date.

This is completely inadequate for plpgsql, if not for the others,
because plpgsql also caches query plans --- which depend on more than
the text of the function.  I don't think it's worth our time to put
in a partial solution; we need to think about a generic cache
invalidation mechanism.

Jan Wieck has posted some comments about this, and I think there was
also some discussion in connection with Karel Zak's proposed cross-
backend query plan cache.  Check the archives...

regards, tom lane