Re: [HACKERS] A 2 phase commit weirdness

2005-05-31 Thread Tom Lane
Alvaro Herrera [EMAIL PROTECTED] writes:
 But at awakening, the user will get this:
 ERROR:  relation 66002 deleted while still in use
 This is ugly -- I don't think there is a way to get out of it.

There had better be a way, since (I suppose) the ERROR is preventing the
commit from succeeding ...

 Unrelated question: is it intended that the prepared transactions are
 visible cross-database through pg_prepared_xacts?

That is a good question.  Can a backend running in a different database
execute the COMMIT (or ROLLBACK)?  Offhand I'd bet that will not work,
which suggests we'd better make the view per-database.  [ thinks a bit
more... ]  We might be able to make it work, but there seems like a lot
of potential for bugs/fragility there.  Might be best to take the narrow
definition to start with.

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] Backslash handling in strings

2005-05-31 Thread Dennis Bjorklund
On Tue, 31 May 2005, Tom Lane wrote:

 The case that convinced me we need to keep some sort of backslash
 capability is this: suppose you want to put a string including a tab
 into your database.  Try to do it with psql:
   t= insert into foo values ('TAB
 Guess what: you won't get anywhere, at least not unless you disable
 readline.  So it's nice to be able to use \t.

To insert a tab using readline you can press ESC followed by TAB. This
works as least in readline as it is setup in redhat/fedora (and readline
can be setup in 1000 different ways so who knows how portable this is).

-- 
/Dennis Björklund


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Backslash handling in strings

2005-05-31 Thread Tom Ivar Helbekkmo
Dennis Bjorklund [EMAIL PROTECTED] writes:

 To insert a tab using readline you can press ESC followed by TAB.

...or ^V followed by TAB, as per age-old tradition.  :-)

-tih
-- 
Don't ascribe to stupidity what can be adequately explained by ignorance.

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] Backslash handling in strings

2005-05-31 Thread Dennis Bjorklund
On Tue, 31 May 2005, Tom Ivar Helbekkmo wrote:

 ...or ^V followed by TAB, as per age-old tradition.  :-)

Right, I forgot about that one. One can also do other control characters
instead of TAB by pressing CTRL-J and similar.

Well, I just wanted to point out that it's possible. The main problem is
still to make sure that old dumps work and can be imported. I don't see
how that can work without a GUC variable in addition to the E'foo' stuff
(but that's not so bad as it can be phased in to support old pg_dumps and
phased out again in pg 10 or something).

-- 
/Dennis Björklund


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] Cost of XLogInsert CRC calculations

2005-05-31 Thread Mark Cave-Ayland

 -Original Message-
 From: Tom Lane [mailto:[EMAIL PROTECTED] 
 Sent: 27 May 2005 17:49
 To: Mark Cave-Ayland (External)
 Cc: 'Manfred Koizar'; 'Greg Stark'; 'Bruce Momjian'; 
 pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] Cost of XLogInsert CRC calculations

(cut)

 I went back and looked at the code, and see that I was misled by
 terminology: what we've been calling 2x32 in this thread is 
 not two independent CRC32 calculations, it is use of 32-bit 
 arithmetic to execute one CRC64 calculation. 

Yeah, I did find the terminology a little confusing until I looked at the
source itself. It doesn't make much sense publishing numbers if you don't
know their meaning ;)

 Based on the numbers we've seen so far, one could argue for 
 staying with the 64-bit CRC, but changing the rule we use for 
 selecting which implementation code to use: use the true 
 64-bit code only when sizeof(unsigned long) == 64, and 
 otherwise use the 2x32 code, even if there is a 64-bit 
 unsigned long long type available.  This essentially assumes 
 that the unsigned long long type isn't very efficient, which 
 isn't too unreasonable.  This would buy most of the speedup 
 without giving up anything at all in the error-detection department.

All our servers are x86 based Linux with gcc, so if a factor of 2 speedup
for CPU calculations is the minimum improvement that we get as a result of
this thread then I would be very happy.

 Alternatively, we might say that 64-bit CRC was overkill from 
 day one, and we'd rather get the additional 10% or 20% or so 
 speedup.  I'm kinda leaning in that direction, but only weakly.

What would you need to persuade you either way? I believe that disk drives
use CRCs internally to verify that the data has been read correctly from
each sector. If the majority of the errors would be from a disk failure,
then a corrupt sector would have to pass the drive CRC *and* the PostgreSQL
CRC in order for an XLog entry to be considered valid. I would have thought
the chances of this being able to happen would be reasonably small and so
even with CRC32 this can be detected fairly accurately.

In the case of an OS crash then we could argue that there may be a partially
written sector to the disk, in which case again either one or both of the
drive CRC and the PostgreSQL CRC would be incorrect and so this condition
could also be reasonably detected using CRC32.

As far as I can tell, the main impact of this would be that we would reduce
the ability to accurately detect multiple random bit errors, which is more
the type of error I would expect to occur in RAM (alpha particles etc.). How
often would this be likely to occur? I believe that different generator
polynomials have different characteristics that can make them more sensitive
to a particular type of error. Perhaps Manfred can tell us the generator
polynomial that was used to create the lookup tables?


Kind regards,

Mark.


WebBased Ltd
South West Technology Centre
Tamar Science Park
Plymouth
PL6 8BT 

T: +44 (0)1752 797131
F: +44 (0)1752 791023
W: http://www.webbased.co.uk



---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] A 2 phase commit weirdness

2005-05-31 Thread Alvaro Herrera
On Tue, May 31, 2005 at 02:09:56AM -0400, Tom Lane wrote:
 Alvaro Herrera [EMAIL PROTECTED] writes:
  But at awakening, the user will get this:
  ERROR:  relation 66002 deleted while still in use
  This is ugly -- I don't think there is a way to get out of it.
 
 There had better be a way, since (I suppose) the ERROR is preventing the
 commit from succeeding ...

No, the ERROR is in a completely unrelated transaction.  The scenario
again is this:

CREATE TABLE foo ();

BEGIN;
DROP TABLE foo;
PREPARE TRANSACTION 'foo';


SELECT * FROM foo;
-- hangs
COMMIT TRANSACTION 'foo';
ERROR, relation deleted while still in
use

So it's a rather contorted situation to begin with.

  Unrelated question: is it intended that the prepared transactions are
  visible cross-database through pg_prepared_xacts?
 
 That is a good question.  Can a backend running in a different database
 execute the COMMIT (or ROLLBACK)?  Offhand I'd bet that will not work,
 which suggests we'd better make the view per-database.  [ thinks a bit
 more... ]  We might be able to make it work, but there seems like a lot
 of potential for bugs/fragility there.  Might be best to take the narrow
 definition to start with.

Ok.

-- 
Alvaro Herrera (alvherre[a]surnet.cl)
El sentido de las cosas no viene de las cosas, sino de
las inteligencias que las aplican a sus problemas diarios
en busca del progreso. (Ernesto Hernández-Novich)

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] Oracle Style packages on postgres

2005-05-31 Thread Jonah H. Harris

Thanks for adding this Bruce!

Is anyone going to be working on this immediately?  If so, I'd be glad 
to work with someone.  Unfortunately, I don't have the time to devote to 
taking something this big on, but I think it would be a really great 
thing to have.  Just let me know [EMAIL PROTECTED] OR 
[EMAIL PROTECTED]  Thanks!


Bruce Momjian wrote:


Added to TODO:

* Add the features of packages
   o  Make private objects accessable only to objects in the same schema
   o  Allow current_schema.objname to access current schema objects
   o  Add session variables
   o  Allow nested schemas


---

Bruce Momjian wrote:
 


OK, so it seems we need:

o  make private objects accessable only to objects
   in the same schema
	o  Allow current_schema.objname to access current 
	   schema objects

o  session variables
o  nested schemas?

---

Dave Held wrote:
   


-Original Message-
From: Bruce Momjian [mailto:[EMAIL PROTECTED]
Sent: Tuesday, May 10, 2005 8:43 AM
To: Thomas Hallgren
Cc: Tom Lane; [EMAIL PROTECTED]; pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Oracle Style packages on postgres

[...]
I suppose.  I think we should focus on the use cases for Oracle
packages, rather than the specific functionality it provides. 
What things do people need PostgreSQL to do that it already

doesn't do?
   


Is that really the best way to go about things?  Already RDBMSes
are patchwork quilts of functionality.  Is merely adding another
patch the most elegant way to evolve the database?  The problem is
that Oracle et al are trying to be ORDBMSes and aren't exactly sure
what the best way to go is.  Instead of trying to formulate a 
rational plan for what an ORDBMS should even look like, they simply

look at what would work with their existing infrastructure and tack
on features.  Then Postgres plays the copycat game.  Instead of
trying to play catch-up with Oracle, why not beat them at their own
game?

What packages provide is encapsulation.  Hiding the data from the
user and forcing him/her to use the public interface (methods).
That is an important and admirable OO feature.  Some people think
that using the DB's security model can achieve the same thing.  It
can't, exactly, but there's an important lesson to be learned from
the suggestion.  The problem is that OOP is a *programming* paradigm,
and a database is not a *programming language*.  In a programming
language, there really is no such thing as security.  There is 
only visibility and accessibility.  Private methods in an OOP

language do not provide *security*; they only limit *accessibility*.
Like so many other differences between the relational model and the
OOP model, there is an impedance mismatch here.  However, there is
also opportunity.

In an OOPL, you can say: Users can call this method from here, but
not from there.  What you *can't* say is: User X can call this
method, but User Y cannot.  As you can see, these are orthogonal
concepts.  You could call the first accessibility by location and
the second accessibility by authentication.  An ORDBMS should
support both.  Private does not respect your identity, only your
calling location.  An ACL does not respect your calling scope, only
your identity.  A system that has both is clearly more flexible than
one that only has one or the other.

Now what you need to keep in mind is that each visibility model 
serves a different purpose.  The purpose of a security model is to 
limit *who* can see/touch certain data because the data has intrinsic 
value.  The purpose of an accessibility model is to limit *where* and 
*how* data can be seen/touched in order to preserve *program 
invariants*.  So if you have an object (or tuple!) that records the 
start and stop time of some process, it is probably a logical 
invariant that the stop time is greater than or equal to the start 
time.  For this reason, in a PL, you would encapsulate these fields 
(attributes) and only provide controlled access to update them that 
checks and preserves the invariant, *no matter who you are*.  You 
don't want a superuser violating this invariant any more than Sue 
User.


Now you might object that constraints allow you to preserve 
invariants as well, and indeed they do.  But constraints do not

respect calling scope.  Suppose there is a process that needs to
update the timestamps in a way that temporarily breaks the invariant
but restores it afterwards.  The only way to effect this in a
constraint environment is to drop the constraint, perform the
operation, and restore it.  However, dropping a constraint is not an
ideal solution because there may be other unprivileged processes 
operating on the relation that still need the constraint to be 
enforced.  There is no way to say: There is a priviledged class of 
methods that is 

Re: [HACKERS] Fix for cross compilation

2005-05-31 Thread Peter Eisentraut
Am Montag, 30. Mai 2005 20:11 schrieb Tom Lane:
 Peter Eisentraut [EMAIL PROTECTED] writes:
  The problem is that the program zic in src/timezone/ is built and run
  during the build process, which doesn't work if it's compiled by a
  cross-compiler.

 Why don't we instead arrange to run it during install?

It does currently run during the install.  How does that help?

 I'm not real thrilled with the notion of trying to use a zic built by a
 different compiler; I think that will lead to all sorts of problems,
 considering that the files it's meant to write are binary and
 at least potentially architecture-specific.  Also there's the problem
 that it is reading a pg_config.h that is definitely platform-specific.

Well, that is true.  Bummer.

 BTW, the truth of the matter is that we've never supported
 cross-compilation; see all the AC_TRY_RUN operations in configure.

They all have (or should have) fall-back values for cross-compilation.  Else, 
those who complain about zic would have never gotten there.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Oracle Style packages on postgres

2005-05-31 Thread Stephen Frost
* Jonah H. Harris ([EMAIL PROTECTED]) wrote:
 Is anyone going to be working on this immediately?  If so, I'd be glad 
 to work with someone.  Unfortunately, I don't have the time to devote to 
 taking something this big on, but I think it would be a really great 
 thing to have.  Just let me know [EMAIL PROTECTED] OR 
 [EMAIL PROTECTED]  Thanks!

It strikes me as slightly unlikely that anyone will start working on
this immediately, but I can tell you it's something that some of my
users have been asking for and so once I finish off my current work on
roles I'll probably be interested in working on this.

Stephen

 Bruce Momjian wrote:
 
 Added to TODO:
 
 * Add the features of packages
o  Make private objects accessable only to objects in the same 
schema
o  Allow current_schema.objname to access current schema objects
o  Add session variables
o  Allow nested schemas
 
 
 ---
 
 Bruce Momjian wrote:
  
 
 OK, so it seems we need:
 
 o  make private objects accessable only to objects
in the same schema
 o  Allow current_schema.objname to access current 
schema objects
 o  session variables
 o  nested schemas?
 
 ---
 
 Dave Held wrote:

 
 -Original Message-
 From: Bruce Momjian [mailto:[EMAIL PROTECTED]
 Sent: Tuesday, May 10, 2005 8:43 AM
 To: Thomas Hallgren
 Cc: Tom Lane; [EMAIL PROTECTED]; pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] Oracle Style packages on postgres
 
 [...]
 I suppose.  I think we should focus on the use cases for Oracle
 packages, rather than the specific functionality it provides. 
 What things do people need PostgreSQL to do that it already
 doesn't do?

 
 Is that really the best way to go about things?  Already RDBMSes
 are patchwork quilts of functionality.  Is merely adding another
 patch the most elegant way to evolve the database?  The problem is
 that Oracle et al are trying to be ORDBMSes and aren't exactly sure
 what the best way to go is.  Instead of trying to formulate a 
 rational plan for what an ORDBMS should even look like, they simply
 look at what would work with their existing infrastructure and tack
 on features.  Then Postgres plays the copycat game.  Instead of
 trying to play catch-up with Oracle, why not beat them at their own
 game?
 
 What packages provide is encapsulation.  Hiding the data from the
 user and forcing him/her to use the public interface (methods).
 That is an important and admirable OO feature.  Some people think
 that using the DB's security model can achieve the same thing.  It
 can't, exactly, but there's an important lesson to be learned from
 the suggestion.  The problem is that OOP is a *programming* paradigm,
 and a database is not a *programming language*.  In a programming
 language, there really is no such thing as security.  There is 
 only visibility and accessibility.  Private methods in an OOP
 language do not provide *security*; they only limit *accessibility*.
 Like so many other differences between the relational model and the
 OOP model, there is an impedance mismatch here.  However, there is
 also opportunity.
 
 In an OOPL, you can say: Users can call this method from here, but
 not from there.  What you *can't* say is: User X can call this
 method, but User Y cannot.  As you can see, these are orthogonal
 concepts.  You could call the first accessibility by location and
 the second accessibility by authentication.  An ORDBMS should
 support both.  Private does not respect your identity, only your
 calling location.  An ACL does not respect your calling scope, only
 your identity.  A system that has both is clearly more flexible than
 one that only has one or the other.
 
 Now what you need to keep in mind is that each visibility model 
 serves a different purpose.  The purpose of a security model is to 
 limit *who* can see/touch certain data because the data has intrinsic 
 value.  The purpose of an accessibility model is to limit *where* and 
 *how* data can be seen/touched in order to preserve *program 
 invariants*.  So if you have an object (or tuple!) that records the 
 start and stop time of some process, it is probably a logical 
 invariant that the stop time is greater than or equal to the start 
 time.  For this reason, in a PL, you would encapsulate these fields 
 (attributes) and only provide controlled access to update them that 
 checks and preserves the invariant, *no matter who you are*.  You 
 don't want a superuser violating this invariant any more than Sue 
 User.
 
 Now you might object that constraints allow you to preserve 
 invariants as well, and indeed they do.  But constraints do not
 respect calling scope.  Suppose there is a process that needs to
 update the timestamps in a way that temporarily breaks the invariant
 but restores it afterwards.  

Re: [HACKERS] Cost of XLogInsert CRC calculations

2005-05-31 Thread Tom Lane
Mark Cave-Ayland [EMAIL PROTECTED] writes:
 Alternatively, we might say that 64-bit CRC was overkill from 
 day one, and we'd rather get the additional 10% or 20% or so 
 speedup.  I'm kinda leaning in that direction, but only weakly.

 What would you need to persuade you either way? I believe that disk drives
 use CRCs internally to verify that the data has been read correctly from
 each sector. If the majority of the errors would be from a disk failure,
 then a corrupt sector would have to pass the drive CRC *and* the PostgreSQL
 CRC in order for an XLog entry to be considered valid. I would have thought
 the chances of this being able to happen would be reasonably small and so
 even with CRC32 this can be detected fairly accurately.

It's not really a matter of backstopping the hardware's error detection;
if we were trying to do that, we'd keep a CRC for every data page, which
we don't.  The real reason for the WAL CRCs is as a reliable method of
identifying the end of WAL: when the next record doesn't checksum you
know it's bogus.  This is a nontrivial point because of the way that we
re-use WAL files --- the pages beyond the last successfully written page
aren't going to be zeroes, they'll be filled with random WAL data.

Personally I think CRC32 is plenty for this job, but there were those
arguing loudly for CRC64 back when we made the decision originally ...

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] Fix for cross compilation

2005-05-31 Thread Tom Lane
Peter Eisentraut [EMAIL PROTECTED] writes:
 Am Montag, 30. Mai 2005 20:11 schrieb Tom Lane:
 Why don't we instead arrange to run it during install?

 It does currently run during the install.  How does that help?

I'm sorry, I meant later than it is now.  Presumably the person doing
the cross-compile has *some* access to the target hardware and could run
the zic program on that hardware.  During the regular install step of a
cross-compile build, instead of trying to run zic, we'd have to copy
zic, the timezone source data files, and probably a small shell script
into the install tree.  Then at some later point the user would have to
invoke the shell script on the target hardware in order to finish the
installation.  Kinda messy, I agree, but at least it will work reliably.

regards, tom lane

---(end of broadcast)---
TIP 8: explain analyze is your friend


[HACKERS] ddl triggers

2005-05-31 Thread Barnali

Hi

I was wondering whether it will be useful to extend postgreSQL support 
to ddl triggers.

This may be useful for tracking and auditing purposes.

I am planning on prototyping this.
Would be interested to hear any comment.

Thanks

Barnali

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [HACKERS] A 2 phase commit weirdness

2005-05-31 Thread Tom Lane
Alvaro Herrera [EMAIL PROTECTED] writes:
 No, the ERROR is in a completely unrelated transaction.  The scenario
 again is this:

 CREATE TABLE foo ();

 BEGIN;
 DROP TABLE foo;
 PREPARE TRANSACTION 'foo';


   SELECT * FROM foo;
   -- hangs
 COMMIT TRANSACTION 'foo';
   ERROR, relation deleted while still in
   use

Oh.  Well, you get that now without any use of PREPARE; it's not clear
what else we could do, except possibly make the message a bit more
user-friendly.

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] Backslash handling in strings

2005-05-31 Thread Bruce Momjian
Tom Lane wrote:
 Greg Stark [EMAIL PROTECTED] writes:
  The only thing I'm not clear on is what exactly is the use case for E''
  strings. That is, who do you expect to actually use them?
 
 The case that convinced me we need to keep some sort of backslash
 capability is this: suppose you want to put a string including a tab
 into your database.  Try to do it with psql:
   t= insert into foo values ('TAB
 Guess what: you won't get anywhere, at least not unless you disable
 readline.  So it's nice to be able to use \t.
 
 There are related issues involving \r and \n depending on your platform.
 And this doesn't even scratch the surface of encoding-related funnies.
 
 So there's definitely a use-case for keeping the existing backslash
 behavior, and E'string' seems like a reasonable proposal for doing that
 without conflicting with the SQL spec.
 
 What I do not see at the moment is how we get there from here (ie,
 dropping backslashing in regular literals) without incurring tremendous
 pain --- including breaking all existing pg_dump files, introducing
 security holes and/or data corruption into many existing apps that are
 not presently broken, and probably some other ways of ruining your day.
 I'm quite unconvinced that this particular letter of the SQL spec is
 worth complying with ...

I think this is going to be like the Win32 port, where there is little
excitement from our existing users, but it is needed to grow our user
base.

I think the E'' is useful becuase it gives people a migration path for
the escapes they are already using, and the escape mechanism itself it
something useful to keep.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Cost of XLogInsert CRC calculations

2005-05-31 Thread Greg Stark

Tom Lane [EMAIL PROTECTED] writes:

 It's not really a matter of backstopping the hardware's error detection;
 if we were trying to do that, we'd keep a CRC for every data page, which
 we don't.  The real reason for the WAL CRCs is as a reliable method of
 identifying the end of WAL: when the next record doesn't checksum you
 know it's bogus.  This is a nontrivial point because of the way that we
 re-use WAL files --- the pages beyond the last successfully written page
 aren't going to be zeroes, they'll be filled with random WAL data.

Is the random WAL data really the concern? It seems like a more reliable way
of dealing with that would be to just accompany every WAL entry with a
sequential id and stop when the next id isn't the correct one.

I thought the problem was that if the machine crashed in the middle of writing
a WAL entry you wanted to be sure to detect that. And there's no guarantee the
fsync will write out the WAL entry in order. So it's possible the end (and
beginning) of the WAL entry will be there but the middle still have been
unwritten.

The only truly reliable way to handle this would require two fsyncs per
transaction commit which would be really unfortunate.

 Personally I think CRC32 is plenty for this job, but there were those
 arguing loudly for CRC64 back when we made the decision originally ...

So given the frequency of database crashes and WAL replays if having one
failed replay every few million years is acceptable I think 32 bits is more
than enough. Frankly I think 16 bits would be enough.

-- 
greg


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] ddl triggers

2005-05-31 Thread Tom Lane
Barnali [EMAIL PROTECTED] writes:
 I was wondering whether it will be useful to extend postgreSQL support 
 to ddl triggers.

This has been proposed and rejected before ... see the archives.

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] Backslash handling in strings

2005-05-31 Thread Bruno Wolff III
On Tue, May 31, 2005 at 11:49:20 +0200,
  Dennis Bjorklund [EMAIL PROTECTED] wrote:
 On Tue, 31 May 2005, Tom Lane wrote:
 
  The case that convinced me we need to keep some sort of backslash
  capability is this: suppose you want to put a string including a tab
  into your database.  Try to do it with psql:
  t= insert into foo values ('TAB
  Guess what: you won't get anywhere, at least not unless you disable
  readline.  So it's nice to be able to use \t.
 
 To insert a tab using readline you can press ESC followed by TAB. This
 works as least in readline as it is setup in redhat/fedora (and readline
 can be setup in 1000 different ways so who knows how portable this is).

There are still advantages to having printable backslashed escaped characters
in strings that are saved to files. It makes it easier to see what is really
in the string and they are less likely to get accidentally munged when
editing the file or moving it between systems with different line termination
conventions.

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] A 2 phase commit weirdness

2005-05-31 Thread Alvaro Herrera
On Tue, May 31, 2005 at 10:44:58AM -0400, Tom Lane wrote:
 Alvaro Herrera [EMAIL PROTECTED] writes:
  No, the ERROR is in a completely unrelated transaction.  The scenario
  again is this:
 
  CREATE TABLE foo ();
 
  BEGIN;
  DROP TABLE foo;
  PREPARE TRANSACTION 'foo';
 
 
  SELECT * FROM foo;
  -- hangs
  COMMIT TRANSACTION 'foo';
  ERROR, relation deleted while still in
  use
 
 Oh.  Well, you get that now without any use of PREPARE; it's not clear
 what else we could do, except possibly make the message a bit more
 user-friendly.

Ah, you are right, sorry :-)  I was imagining I had to cope with that
but evidently not.

-- 
Alvaro Herrera (alvherre[a]surnet.cl)
Granting software the freedom to evolve guarantees only different results,
not better ones. (Zygo Blaxell)

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] Cost of XLogInsert CRC calculations

2005-05-31 Thread Tom Lane
Greg Stark [EMAIL PROTECTED] writes:
 Tom Lane [EMAIL PROTECTED] writes:
 It's not really a matter of backstopping the hardware's error detection;
 if we were trying to do that, we'd keep a CRC for every data page, which
 we don't.  The real reason for the WAL CRCs is as a reliable method of
 identifying the end of WAL: when the next record doesn't checksum you
 know it's bogus.

 Is the random WAL data really the concern? It seems like a more reliable way
 of dealing with that would be to just accompany every WAL entry with a
 sequential id and stop when the next id isn't the correct one.

We do that, too (the xl_prev links and page header addresses serve that
purpose).  But it's not sufficient given that WAL records can span pages
and therefore may be incompletely written.

 The only truly reliable way to handle this would require two fsyncs per
 transaction commit which would be really unfortunate.

How are two fsyncs going to be better than one?

regards, tom lane

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] Oracle Style packages on postgres

2005-05-31 Thread Josh Berkus
Bruce,

  Added to TODO:
  
  * Add the features of packages
 o  Make private objects accessable only to objects in the same
 schema
 o  Allow current_schema.objname to access current schema objects
 o  Add session variables
 o  Allow nested schemas

Hmmm ... was there a reason we decided not to just make this explicitly tied 
to SQL2003 TYPES?

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] ddl triggers

2005-05-31 Thread Josh Berkus
Tom,

  I was wondering whether it will be useful to extend postgreSQL support
  to ddl triggers.

 This has been proposed and rejected before ... see the archives.

Eh?  I thought it was a TODO.

--Josh

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] ddl triggers

2005-05-31 Thread Tom Lane
Josh Berkus josh@agliodbs.com writes:
 I was wondering whether it will be useful to extend postgreSQL support
 to ddl triggers.
 
 This has been proposed and rejected before ... see the archives.

 Eh?  I thought it was a TODO.

Or see the TODO list ... I was looking for some small evidence of
familiarity with the previous discussions ...

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] Cost of XLogInsert CRC calculations

2005-05-31 Thread Greg Stark
Tom Lane [EMAIL PROTECTED] writes:

  Is the random WAL data really the concern? It seems like a more reliable way
  of dealing with that would be to just accompany every WAL entry with a
  sequential id and stop when the next id isn't the correct one.
 
 We do that, too (the xl_prev links and page header addresses serve that
 purpose).  But it's not sufficient given that WAL records can span pages
 and therefore may be incompletely written.

Right, so the problem isn't that there may be stale data that would be
unrecognizable from real data. The problem is that the real data may be
partially there but not all there.

  The only truly reliable way to handle this would require two fsyncs per
  transaction commit which would be really unfortunate.
 
 How are two fsyncs going to be better than one?

Well you fsync the WAL entry and only when that's complete do you flip a bit
marking the WAL entry as commited and fsync again.

Hm, you might need three fsyncs, one to make sure the bit isn't set before
writing out the WAL record itself.

-- 
greg


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Cost of XLogInsert CRC calculations

2005-05-31 Thread Tom Lane
Greg Stark [EMAIL PROTECTED] writes:
 Tom Lane [EMAIL PROTECTED] writes:
 Is the random WAL data really the concern? It seems like a more reliable way
 of dealing with that would be to just accompany every WAL entry with a
 sequential id and stop when the next id isn't the correct one.
 
 We do that, too (the xl_prev links and page header addresses serve that
 purpose).  But it's not sufficient given that WAL records can span pages
 and therefore may be incompletely written.

Actually, on reviewing the code I notice two missed bets here.

1. During WAL replay, we aren't actually verifying that xl_prev matches
the address of the prior WAL record.  This means we are depending only
on the page header addresses to make sure we don't replay stale WAL data
left over from the previous cycle of use of the physical WAL file.  This
is fairly dangerous, considering the likelihood of partial write of a
WAL page during a power failure: the first 512-byte sector(s) of a page
may have been updated but not the rest.  If an old WAL record happens to
start at exactly the sector boundary then we lose.

2. We store a separate CRC for each backup block attached to a WAL
record.  Therefore the same torn-page problem could hit us if a stale
backup block starts exactly at a intrapage sector boundary --- there is
nothing guaranteeing that the backup block really goes with the WAL
record.

#1 seems like a pretty critical, but also easily fixed, bug.  To fix #2
I suppose we'd have to modify the WAL format to store just one CRC
covering the whole of a WAL record and its attached backup block(s).

I think the reasoning behind the separate CRCs was to put a limit on
the amount of data guarded by one CRC, and thereby minimize the risk
of undetected errors.  But using the CRCs in this way is failing to
guard against exactly the problem that we want the CRCs to guard against
in the first place, namely torn WAL records ... so worrying about
detection reliability seems misplaced.

The odds of an actual failure from case #2 are fortunately not high,
since a backup block will necessarily span across at least one WAL page
boundary and so we should be able to detect stale data by noting that
the next page's header address is wrong.  (If it's not wrong, then at
least the first sector of the next page is up-to-date, so if there is
any tearing the CRC should tell us.)  Therefore I don't feel any need
to try to work out a back-patchable solution for #2.  But I do think we
ought to change the WAL format going forward to compute just one CRC
across a WAL record and all attached backup blocks.  There was talk of
allowing compression of backup blocks, and if we do that we could no
longer feel any certainty that a page crossing would occur.

Thoughts?

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


[HACKERS] new antidote found

2005-05-31 Thread Mya Mason


binFZu6XGWBjF.bin
Description: 


[HACKERS] CREATE DATABASE fails when template1 being accessed ...

2005-05-31 Thread Marc G. Fournier



ERROR:  source database template1 is being accessed by other users

Why is this not allowed?  Not that there is generally a reason to be in 
template1, but am curious as to why it prevents a new DB from being 
created if someone is connected to it ...



Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email: [EMAIL PROTECTED]   Yahoo!: yscrappy  ICQ: 7615664

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


[HACKERS] Deadlock with tsearch2 index ...

2005-05-31 Thread Marc G. Fournier


Just want to make sure that this is, in fact, what is expected:

client1: begin;
client1: update articles set some_col = foo where id = bar;
client2: update articles set some_col2 = foo2 where id = bar;
client1: update articles set some_col3 = foo where id = bar;
client1: ** deadlock **

client2 can't finish its 'transaction', and is therefore preventing 
client1 from continuing ... ?


Assuming that this is, in fact, 'normal', is there a way of breaking the 
deadlock?


If this is *not*, in fact, 'normal', is there any more information that I 
can provide to debug this?


This is a 7.4.6 database right now ...


Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email: [EMAIL PROTECTED]   Yahoo!: yscrappy  ICQ: 7615664

---(end of broadcast)---
TIP 6: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] Cost of XLogInsert CRC calculations

2005-05-31 Thread Manfred Koizar
On Tue, 31 May 2005 12:07:53 +0100, Mark Cave-Ayland
[EMAIL PROTECTED] wrote:
Perhaps Manfred can tell us the generator
polynomial that was used to create the lookup tables?

 32   26   23   22   16   12   11   10   8   7   5   4   2   1
X  + X  + X  + X  + X  + X  + X  + X  + X + X + X + X + X + X + 1

- http://www.opengroup.org/onlinepubs/009695399/utilities/cksum.html

Or google for 04c11db7.

Servus
 Manfred

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] Consumer-grade vs enterprise-grade disk drives

2005-05-31 Thread Jeffrey W. Baker
On Mon, 2005-05-30 at 21:38 -0700, Luke Lonergan wrote:
 Tom,
 
 This is a story that is evolving.  Anyone else use StorageReview?  Great
 comprehensive drive benchmarks:
   http://www.storagereview.com/
 
 Check the comparisons between 15K RPM SCSI drives and the 2004 Western
 Digital 10K RPM SATA (Raptor) drives.  The Raptors are an interesting hybrid
 of SCSI-related tech and desktop tech, and were some of the first drives
 with SCSI-like command queuing TCQ/NCQ.

If we're looking at the same benchmark (File Server DriveMark), the
fastest SCSI disk is 65% faster than the fastest SATA disk.  The fastest
SCSI 10K disk is 25% faster than the SATA.

 I think the last remaining issue in moving to SATA for all enterprise use is
 the lack of decent SATA controllers, though 3Ware (http://www.3ware.com) is
 getting there:
   http://www.3ware.com/link/pdf/Serial-ATA.pdf
   http://www.3ware.com/products/benchmarks_sata.asp


The 3Ware controllers are probably the worst ones you can get for
database use.  Their caches are slow (I didn't even know that was
possible until I bought one), as are the XOR engines.

After reading this very comprehensive benchmark:

http://print.tweakers.net/?reviews/557

I purchased one of the Areca controllers with a large battery-backed
cache.  It is unholy fast.  I recommend it.

-jwb

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] CREATE DATABASE fails when template1 being accessed ...

2005-05-31 Thread Tom Lane
Marc G. Fournier [EMAIL PROTECTED] writes:
 ERROR:  source database template1 is being accessed by other users

 Why is this not allowed?

It's a rather lame attempt to ensure that you don't get a corrupt copy
due to the database changing while you copy it ... I'd like to find
a better way to do it ...

regards, tom lane

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] CREATE DATABASE fails when template1 being accessed ...

2005-05-31 Thread Bruno Wolff III
On Tue, May 31, 2005 at 14:53:41 -0300,
  Marc G. Fournier [EMAIL PROTECTED] wrote:
 
 
 ERROR:  source database template1 is being accessed by other users
 
 Why is this not allowed?  Not that there is generally a reason to be in 
 template1, but am curious as to why it prevents a new DB from being 
 created if someone is connected to it ...

Because if someone changes something in the database while it is being
copied, the newly created database will probably be messed up.

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] CREATE DATABASE fails when template1 being accessed ...

2005-05-31 Thread Alvaro Herrera
On Tue, May 31, 2005 at 02:53:41PM -0300, Marc G. Fournier wrote:

 ERROR:  source database template1 is being accessed by other users
 
 Why is this not allowed?  Not that there is generally a reason to be in 
 template1, but am curious as to why it prevents a new DB from being 
 created if someone is connected to it ...

Because we copy the files directly.  So if someone modifies one file in
the middle of the copy, or, say, we copy one index and later the table
and someone inserts into a table in the middle, then the new database is
in an inconsistent state.

There has been talk about fixing this in the past, but conclusion seems
to be too hard to fix for too little benefit.

-- 
Alvaro Herrera (alvherre[a]surnet.cl)
No hay hombre que no aspire a la plenitud, es decir,
la suma de experiencias de que un hombre es capaz

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Deadlock with tsearch2 index ...

2005-05-31 Thread Tom Lane
Marc G. Fournier [EMAIL PROTECTED] writes:
 Just want to make sure that this is, in fact, what is expected:

 client1: begin;
 client1: update articles set some_col = foo where id = bar;
 client2: update articles set some_col2 = foo2 where id = bar;
 client1: update articles set some_col3 = foo where id = bar;
 client1: ** deadlock **

 client2 can't finish its 'transaction', and is therefore preventing 
 client1 from continuing ... ?

Hmm, I can see why client2 might be blocked by client1, but if client1
holds the lock it should be able to continue to update the table.

I take it from your title that this only happens if there's a tsearch2
index on the table?  Can you put together a test case?

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] CREATE DATABASE fails when template1 being accessed ...

2005-05-31 Thread Alvaro Herrera
On Tue, May 31, 2005 at 02:49:09PM -0400, Tom Lane wrote:
 Marc G. Fournier [EMAIL PROTECTED] writes:
  ERROR:  source database template1 is being accessed by other users
 
  Why is this not allowed?
 
 It's a rather lame attempt to ensure that you don't get a corrupt copy
 due to the database changing while you copy it ... I'd like to find
 a better way to do it ...

You sounded less enthusiastic about it on january:

http://archives.postgresql.org/pgsql-bugs/2005-01/msg00395.php

-- 
Alvaro Herrera (alvherre[a]surnet.cl)
Investigación es lo que hago cuando no sé lo que estoy haciendo
(Wernher von Braun)

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


[HACKERS] Quick-and-dirty compression for WAL backup blocks

2005-05-31 Thread Tom Lane
It seems we are more or less agreed that 32-bit CRC ought to be enough
for WAL; and we also need to make a change to ensure that backup blocks
are positively linked to their parent WAL record, as I noted earlier
today.  So as long as we have to mess with the WAL record format, I was
wondering what else we could get done in the same change.

The TODO item that comes to mind immediately is Compress WAL entries.
The TODO.detail file for that has a whole lot of ideas of various
(mostly high) levels of complexity, but one thing we could do fairly
trivially is to try to compress the page images that are dumped into WAL
to protect against partial-write problems.  After reviewing the old
discussion I still like the proposal I made:

 ... make the WAL writing logic aware of the layout
 of buffer pages --- specifically, to know that our pages generally
 contain an uninteresting hole in the middle, and not write the hole.
 Optimistically this might reduce the WAL data volume by something
 approaching 50%; though pessimistically (if most pages are near full)
 it wouldn't help much.

A more concrete version of this is: examine the page to see if the
pd_lower field is between SizeOfPageHeaderData and BLCKSZ, and if so
whether there is a run of consecutive zero bytes beginning at the
pd_lower position.  Omit any such bytes from what is written to WAL.
(This definition ensures that nothing goes wrong if the page does not
follow the normal page layout conventions: the transformation is
lossless no matter what, since we can always reconstruct the exact page
contents.)  The overhead needed is only 2 bytes to show the number of
bytes removed.

The other alternatives that were suggested included running the page
contents through the same compressor used for TOAST, and implementing
a general-purpose run-length compressor that could get rid of runs of
zeroes anywhere on the page.  However, considering that the compression
work has to be done while holding WALInsertLock, it seems to me there
is a strong premium on speed.  I think that lets out the TOAST
compressor, which isn't amazingly speedy.  (Another objection to the
TOAST compressor is that it certainly won't win on already-compressed
toasted data.)  A run-length compressor would be reasonably quick but
I think that the omit-the-middle-hole approach gets most of the possible
win with even less work.  In particular, I think it can be proven that
omit-the-hole will actually require less CPU than now, since counting
zero bytes should be strictly faster than CRC'ing bytes, and we'll be
able to save the CRC work on whatever bytes we omit.

Any objections?

regards, tom lane

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] CREATE DATABASE fails when template1 being accessed ...

2005-05-31 Thread Tom Lane
Alvaro Herrera [EMAIL PROTECTED] writes:
 On Tue, May 31, 2005 at 02:49:09PM -0400, Tom Lane wrote:
 It's a rather lame attempt to ensure that you don't get a corrupt copy
 due to the database changing while you copy it ... I'd like to find
 a better way to do it ...

 You sounded less enthusiastic about it on january:
 http://archives.postgresql.org/pgsql-bugs/2005-01/msg00395.php

Well, I was expressing dissatisfaction with the obvious way of fixing
it.  If we knew a low-overhead way to fix it I'd be much more
enthusiastic ...

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[HACKERS] Tablespace-level Block Size Definitions

2005-05-31 Thread Jonah H. Harris

Hey everyone,

I'm sure this has been thought of but was wondering whether anyone had 
discussed the allowance of run-time block size specifications at the 
tablespace level?  I know that a change such as this would substantially 
impact buffer operations, transactions, access methods, the storage 
manager, and a lot of other stuff, however it would give an 
administrator the ability to inhance performance for specific applications.


Arguably, one can set the block size at compile-time, but for a system 
running multiple databases it *may* be a nice feature.  Would it be used 
a lot?  Probably not.  Would I use it?  Certainly!  Would some of my 
clients use it?  Yes.


Perhaps a TODO item for some advantageous company to fund?

-Jonah


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] Deadlock with tsearch2 index ...

2005-05-31 Thread Marc G. Fournier

On Tue, 31 May 2005, Tom Lane wrote:


Marc G. Fournier [EMAIL PROTECTED] writes:

Just want to make sure that this is, in fact, what is expected:



client1: begin;
client1: update articles set some_col = foo where id = bar;
client2: update articles set some_col2 = foo2 where id = bar;
client1: update articles set some_col3 = foo where id = bar;
client1: ** deadlock **



client2 can't finish its 'transaction', and is therefore preventing
client1 from continuing ... ?


Hmm, I can see why client2 might be blocked by client1, but if client1
holds the lock it should be able to continue to update the table.

I take it from your title that this only happens if there's a tsearch2
index on the table?  Can you put together a test case?


I haven't tried this myself, but the client wrote this very quick script 
that apparently recreates it every time:


test.sql:
---
/* load contrib/btree_gist.sql into database */

begin;

create table test (
 id serial primary key,
 age int
);
create index test_age_key on test using gist(age);

insert into test values (nextval('test_id_seq'), 1);

commit;

/*

client1:
begin; update test set age = 2 where id = 1;
client2:
update test set age = 2 where id = 1;
client1:
update test set age = 2 where id = 1;
...deadlock...

*/



Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email: [EMAIL PROTECTED]   Yahoo!: yscrappy  ICQ: 7615664

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] Tablespace-level Block Size Definitions

2005-05-31 Thread Tom Lane
Jonah H. Harris [EMAIL PROTECTED] writes:
 I'm sure this has been thought of but was wondering whether anyone had 
 discussed the allowance of run-time block size specifications at the 
 tablespace level?

Can you produce any evidence whatsoever that this could be worth the cost?
Aside from the nontrivial development effort needed, there would be
runtime inefficiencies created --- for instance, inefficient use of
buffer pool storage because it'd no longer be true that any buffer could
hold any block.  Without some pretty compelling evidence, I wouldn't
even waste any time thinking about it ...

regards, tom lane

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] Tablespace-level Block Size Definitions

2005-05-31 Thread Bricklen Anderson

Jonah H. Harris wrote:

Hey everyone,

I'm sure this has been thought of but was wondering whether anyone had 
discussed the allowance of run-time block size specifications at the 
tablespace level?  I know that a change such as this would substantially 
impact buffer operations, transactions, access methods, the storage 
manager, and a lot of other stuff, however it would give an 
administrator the ability to inhance performance for specific applications.


Arguably, one can set the block size at compile-time, but for a system 
running multiple databases it *may* be a nice feature.  Would it be used 
a lot?  Probably not.  Would I use it?  Certainly!  Would some of my 
clients use it?  Yes.


Perhaps a TODO item for some advantageous company to fund?

-Jonah


Have you used Oracle's version as well?

--
___

This e-mail may be privileged and/or confidential, and the sender does
not waive any related rights and obligations. Any distribution, use or
copying of this e-mail or the information it contains by other than an
intended recipient is unauthorized. If you received this e-mail in
error, please advise me (by return e-mail or otherwise) immediately.
___

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [HACKERS] Tablespace-level Block Size Definitions

2005-05-31 Thread Jonah H. Harris

Yes,

That is what I/my clients have been discussing.  It is a nifty 
performance feature.


Bricklen Anderson wrote:


Jonah H. Harris wrote:


Hey everyone,

I'm sure this has been thought of but was wondering whether anyone 
had discussed the allowance of run-time block size specifications at 
the tablespace level?  I know that a change such as this would 
substantially impact buffer operations, transactions, access methods, 
the storage manager, and a lot of other stuff, however it would give 
an administrator the ability to inhance performance for specific 
applications.


Arguably, one can set the block size at compile-time, but for a 
system running multiple databases it *may* be a nice feature.  Would 
it be used a lot?  Probably not.  Would I use it?  Certainly!  Would 
some of my clients use it?  Yes.


Perhaps a TODO item for some advantageous company to fund?

-Jonah



Have you used Oracle's version as well?




---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [HACKERS] Tablespace-level Block Size Definitions

2005-05-31 Thread Alvaro Herrera
On Tue, May 31, 2005 at 02:55:29PM -0600, Jonah H. Harris wrote:
 Hey everyone,
 
 I'm sure this has been thought of but was wondering whether anyone had 
 discussed the allowance of run-time block size specifications at the 
 tablespace level?  I know that a change such as this would substantially 
 impact buffer operations, transactions, access methods, the storage 
 manager, and a lot of other stuff, however it would give an 
 administrator the ability to inhance performance for specific applications.

The problem I see with this proposal is that the buffer manager knows
how to handle only a equally-sized pages.  And the shared memory stuff
gets sized according to size * num_pages.  So what happens if a certain
tablespace A with pagesize=X gets to have a lot of its pages cached,
evicting pages from tablespace B with pagesize=Y, where Y  X?

While I think it could be workable to make the buffer manager handle
variable-sized pages, it could prove difficult to handle the shared
memory.  (We can't resize it while the server is running.)

-- 
Alvaro Herrera (alvherre[a]surnet.cl)
La principal característica humana es la tontería
(Augusto Monterroso)

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] Deadlock with tsearch2 index ...

2005-05-31 Thread Tom Lane
Marc G. Fournier [EMAIL PROTECTED] writes:
 On Tue, 31 May 2005, Tom Lane wrote:
 I take it from your title that this only happens if there's a tsearch2
 index on the table?  Can you put together a test case?

 I haven't tried this myself, but the client wrote this very quick script 
 that apparently recreates it every time:

So it does.  What's happening is that since client2 wants to update the
same row client1 already did, client2 blocks for client1 to commit and
release the row lock.  Which is reasonable.  But for non-concurrent
indexes (ie, gist and rtree), we take out AccessExclusiveLock at the
start of the statement, and release it again at the end of the
statement.  So client1 released its index lock, and now client2 is
holding it while blocked on the row, and then client1 needs it back
in order to do its second update.

AFAICS, a near-term fix for this would require either that

(1) we don't release exclusive index locks at end of statement, but
hold them till end of transaction; then client1 would still be holding
the index lock and client2 would be blocked on that, not on the row.
Or

(2) we acquire and release the index lock for each *tuple* rather than
each statement.  Then client2 doesn't hold the index lock while it's
waiting for the row lock to clear.

Neither of these cures sounds attractive :-(.  I think #1 would probably
do as much to create deadlock cases as to prevent them.  #2 would avoid
the deadlock but the performance cost would be high.

The real solution is to upgrade GIST to be concurrent.  Oleg and Teodor
have made some noises about that in the past, but nothing's been done
about it that I've heard of.

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


[HACKERS] Major flood of mail to lists ...

2005-05-31 Thread Marc G. Fournier


Do to moderator error (namely, mine), several hundred messages (spread 
across all the lists) were just approved ...


Sorry for all the incoming junk :(


Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email: [EMAIL PROTECTED]   Yahoo!: yscrappy  ICQ: 7615664

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] Deadlock with tsearch2 index ...

2005-05-31 Thread Oleg Bartunov

On Tue, 31 May 2005, Tom Lane wrote:


The real solution is to upgrade GIST to be concurrent.  Oleg and Teodor
have made some noises about that in the past, but nothing's been done
about it that I've heard of.


unfortunately, we still couldn't find 2-3 months for dedicated work on
concurrencyrecovery for GiST. I'm trying to find support here in Russia
for our work.



regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Regards,
Oleg
_
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


[HACKERS] Physical Tlist optimization still possible?

2005-05-31 Thread Simon Riggs

If a table is created as WITHOUT OIDS, is it still possible to take
advantage of the physical tlist optimization?

Best Regards, Simon Riggs


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] Deadlock with tsearch2 index ...

2005-05-31 Thread Marc G. Fournier

On Tue, 31 May 2005, Tom Lane wrote:

(2) we acquire and release the index lock for each *tuple* rather than 
each statement.  Then client2 doesn't hold the index lock while it's 
waiting for the row lock to clear.


Neither of these cures sounds attractive :-(.  I think #1 would probably 
do as much to create deadlock cases as to prevent them.  #2 would avoid 
the deadlock but the performance cost would be high.


But ... this wouldn't affect SELECT operations, would it?  And only GiST 
related operations?  Would the performance loss be noticeable?  And, would 
the performance cost not be worth getting rid of the deadlocks, until the 
concurrency issues can be properly dealt with?



Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email: [EMAIL PROTECTED]   Yahoo!: yscrappy  ICQ: 7615664

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


[HACKERS] NOLOGGING option, or ?

2005-05-31 Thread Simon Riggs
Recent test results have shown a substantial performance improvement
(+25%) if WAL logging is disabled for large COPY statements. This is to
be expected, though has a price attached: losing the ability to crash
recover data loaded in this manner.

There are two parts to this proposal. First, when and whether to do this
at all. Second, syntax and invocation.

Why?

Performance.

The performance gain has a price and so should only be enabled if
requested explicitly by the user. It is up to the user whether they
accept this price, since in many useful cases it is a small price
against a huge saving.

The price is that if a crash occurs, then any table that was not empty
to begin with would not be in a transactionally consistent state
following crash recovery. It may have data in it, but it would be up to
the user to determine whether that was satisfactory or not. It could be
possible to sense what to do in this situation automatically, by putting
the table into a needs-recovery type state... I don't propose to handle
this *at this stage*.

Syntax and invocation:

Previously I had discussed adding a NOLOGGING option onto both COPY and
CREATE TABLE AS SELECT that would bypass the creation of wal logging
data. That is still a possibility, but would require manual code changes
to much of the SQL submitted.

Now, I would like to discuss adding an enable_logging USERSET GUC, that
would apply *only* to COPY and CREATE TABLE AS SELECT. The default of
this would be false.

How can we gain this performance benefit for those willing to accept the
restrictions imposed?

Your comments are sought and are most welcome.

Best Regards, Simon Riggs


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] Tablespace-level Block Size Definitions

2005-05-31 Thread Jonah H. Harris

Tom,

You and I both know that depending on the application and data, 
different block sizes are beneficial.  As for actual statistics due to 
overhead, I don't know what I can give you.


I can provide stats from an application which fits the case for multiple 
block sizes on Oracle, but Oracle accounts for this overhead anyway.  I 
can give you academic research studies, which may be fairly unreliable 
in a real-world setting.


I don't disagree at all that supporting multiple block sizes would be 
one big PITA to implement and that it would add overhead.  I am just 
saying that it would be a useful feature for *some* people.  Granted, 
this isn't a large population (at this point in time), but applications 
have been written and optimized using these features.


You are all really smart and I'm just putting this suggestion out there 
to stew on.  I don't want you guys to think that I'm just throwing out 
every Oracle feature I can find, just that when I'm working on an 
application which benefits from a feature which would similarly be 
useful in PostgreSQL, I suggest it.  You guys have been working on pgsql 
far longer than I, so for my ideas, chew 'em up and spit 'em out, I 
don't take offense.  As I stated initially, this wouldn't even be a 
low-priority thing, just a nicety that IMHO would be well-placed in a 
TODO (possibly as investigate usability and feasability of block sizes 
by tablespace).


Tom, I respect your insight and would be more than happy to get you any 
information you'd like concerning this subject or any other I may 
suggest.  I don't want to waste your time, so if there is anything in 
particular you want to see, just let me know.  Thanks.


-Jonah



Tom Lane wrote:


Jonah H. Harris [EMAIL PROTECTED] writes:
 

I'm sure this has been thought of but was wondering whether anyone had 
discussed the allowance of run-time block size specifications at the 
tablespace level?
   



Can you produce any evidence whatsoever that this could be worth the cost?
Aside from the nontrivial development effort needed, there would be
runtime inefficiencies created --- for instance, inefficient use of
buffer pool storage because it'd no longer be true that any buffer could
hold any block.  Without some pretty compelling evidence, I wouldn't
even waste any time thinking about it ...

regards, tom lane

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings
 




---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] Cost of XLogInsert CRC calculations

2005-05-31 Thread Simon Riggs
On Tue, 2005-05-31 at 12:27 -0400, Tom Lane wrote:
 Greg Stark [EMAIL PROTECTED] writes:
  Tom Lane [EMAIL PROTECTED] writes:
  Is the random WAL data really the concern? It seems like a more reliable 
  way
  of dealing with that would be to just accompany every WAL entry with a
  sequential id and stop when the next id isn't the correct one.
  
  We do that, too (the xl_prev links and page header addresses serve that
  purpose).  But it's not sufficient given that WAL records can span pages
  and therefore may be incompletely written.
 
 Actually, on reviewing the code I notice two missed bets here.
 
 1. During WAL replay, we aren't actually verifying that xl_prev matches
 the address of the prior WAL record.  This means we are depending only
 on the page header addresses to make sure we don't replay stale WAL data
 left over from the previous cycle of use of the physical WAL file.  This
 is fairly dangerous, considering the likelihood of partial write of a
 WAL page during a power failure: the first 512-byte sector(s) of a page
 may have been updated but not the rest.  If an old WAL record happens to
 start at exactly the sector boundary then we lose.

Hmmm. I seem to recall asking myself why xl_prev existed if it wasn't
used, but passed that by. Damn.

 2. We store a separate CRC for each backup block attached to a WAL
 record.  Therefore the same torn-page problem could hit us if a stale
 backup block starts exactly at a intrapage sector boundary --- there is
 nothing guaranteeing that the backup block really goes with the WAL
 record.
 
 #1 seems like a pretty critical, but also easily fixed, bug.  To fix #2
 I suppose we'd have to modify the WAL format to store just one CRC
 covering the whole of a WAL record and its attached backup block(s).
 
 I think the reasoning behind the separate CRCs was to put a limit on
 the amount of data guarded by one CRC, and thereby minimize the risk
 of undetected errors.  But using the CRCs in this way is failing to
 guard against exactly the problem that we want the CRCs to guard against
 in the first place, namely torn WAL records ... so worrying about
 detection reliability seems misplaced.
 
 The odds of an actual failure from case #2 are fortunately not high,
 since a backup block will necessarily span across at least one WAL page
 boundary and so we should be able to detect stale data by noting that
 the next page's header address is wrong.  (If it's not wrong, then at
 least the first sector of the next page is up-to-date, so if there is
 any tearing the CRC should tell us.)  Therefore I don't feel any need
 to try to work out a back-patchable solution for #2.  But I do think we
 ought to change the WAL format going forward to compute just one CRC
 across a WAL record and all attached backup blocks.  There was talk of
 allowing compression of backup blocks, and if we do that we could no
 longer feel any certainty that a page crossing would occur.
 
 Thoughts?

PreAllocXLog was already a reason to have somebody prepare new xlog
files ahead of them being used. Surely the right solution here is to
have that agent prepare fresh/zeroed files prior to them being required.
That way no stale data can ever occur and both of these bugs go
away 

Fixing that can be backpatched so that the backend that switches files
can do the work, rather than bgwriter [ or ?].

Best Regards, Simon Riggs


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Quick-and-dirty compression for WAL backup blocks

2005-05-31 Thread Simon Riggs
On Tue, 2005-05-31 at 16:26 -0400, Tom Lane wrote:
 The TODO item that comes to mind immediately is Compress WAL entries.
 A more concrete version of this is: examine the page to see if the
 pd_lower field is between SizeOfPageHeaderData and BLCKSZ, and if so
 whether there is a run of consecutive zero bytes beginning at the
 pd_lower position.  Omit any such bytes from what is written to WAL.
 (This definition ensures that nothing goes wrong if the page does not
 follow the normal page layout conventions: the transformation is
 lossless no matter what, since we can always reconstruct the exact page
 contents.)  The overhead needed is only 2 bytes to show the number of
 bytes removed.
 
 The other alternatives that were suggested included running the page
 contents through the same compressor used for TOAST, and implementing
 a general-purpose run-length compressor that could get rid of runs of
 zeroes anywhere on the page.  However, considering that the compression
 work has to be done while holding WALInsertLock, it seems to me there
 is a strong premium on speed.  I think that lets out the TOAST
 compressor, which isn't amazingly speedy.  (Another objection to the
 TOAST compressor is that it certainly won't win on already-compressed
 toasted data.)  A run-length compressor would be reasonably quick but
 I think that the omit-the-middle-hole approach gets most of the possible
 win with even less work.  In particular, I think it can be proven that
 omit-the-hole will actually require less CPU than now, since counting
 zero bytes should be strictly faster than CRC'ing bytes, and we'll be
 able to save the CRC work on whatever bytes we omit.
 
 Any objections?

None: completely agree with your analysis. Sounds great.

 It seems we are more or less agreed that 32-bit CRC ought to be enough
 for WAL; and we also need to make a change to ensure that backup blocks
 are positively linked to their parent WAL record, as I noted earlier
 today.  So as long as we have to mess with the WAL record format, I was
 wondering what else we could get done in the same change.

Is this a change that would be backpatched as you suggested previously?
It seems a rather large patch to change three things at once. Can the
backpatch wait until 8.1 has gone through beta to allow the changes to
be proven?

Best Regards, Simon Riggs



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] Tablespace-level Block Size Definitions

2005-05-31 Thread Simon Riggs
On Tue, 2005-05-31 at 17:05 -0400, Tom Lane wrote:
 Jonah H. Harris [EMAIL PROTECTED] writes:
  I'm sure this has been thought of but was wondering whether anyone had 
  discussed the allowance of run-time block size specifications at the 
  tablespace level?
 
 Can you produce any evidence whatsoever that this could be worth the cost?
 Aside from the nontrivial development effort needed, there would be
 runtime inefficiencies created --- for instance, inefficient use of
 buffer pool storage because it'd no longer be true that any buffer could
 hold any block.  Without some pretty compelling evidence, I wouldn't
 even waste any time thinking about it ...

DB2 has had multiple page size support for some time, though the default
was always 4KB. They have just reintroduced the option to have a single
page size  4KB across the database. They would not do this if there was
not clear evidence that multiple block sizes were inefficient in some
reasonably common cases.

I must admit when I cam here, I thought the same as Jonah. But the I
haven't seen any recent evidence for any benefit. Its a real pain trying
to test this and very difficult to change once its been setup. There's a
great deal more benefit to be had from many other areas, IMHO.

Best Regards, Simon Riggs


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] Consumer-grade vs enterprise-grade disk drives

2005-05-31 Thread Luke Lonergan

Jeff,
 
 If we're looking at the same benchmark (File Server DriveMark), the
 fastest SCSI disk is 65% faster than the fastest SATA disk.  The fastest
 SCSI 10K disk is 25% faster than the SATA.

I think it's misleading to compare drives on the basis of one benchmark.
One of the things I like a lot about storage review is the Head to Head
Comparison feature.  If you use that, you see a completely different story:
  
http://www.storagereview.com/php/benchmark/compare_rtg_2001.php?typeID=10te
stbedID=3osID=4raidconfigID=1numDrives=1devID_0=237devID_1=263devID_2=
259devCnt=3

In particular, that the 2004 era Raptor is somewhere between the fastest 10K
RPM SCSI drive and the fastest (2005) 15K RPM SCSI drive on all but the
tests where TCQ is dominant (the two server quite benchmarks).

So, if you are doing high concurrent random IOs, you will still prefer the
more mature TCQ support in the SCSI drives.  If you are doing large block
IOs, you will favor the SATA drive for it's faster transfer rate/RPM/$.

In the words of the editors in June 2004:

 In the end, the potential for SATA to invade the entry- and mid-level
server market is there. The performance is definitely there. If the Raptor's
reliability proves comparable to the competition and if the
infrastructure/support hardware surface, WD will have a viable contender.

 The 3Ware controllers are probably the worst ones you can get for
 database use.  Their caches are slow (I didn't even know that was
 possible until I bought one), as are the XOR engines.
 
 After reading this very comprehensive benchmark:
 
 http://print.tweakers.net/?reviews/557

Great review - thanks for the link.  I'm not seeing as harsh a conclusion
about the 3Ware 9500 adapter as you - in fact, it's at the top of the IO/s
on IOMeter Fileserver simulation and it's in the pack toward the top on most
all of the other benchmarks.  The only benchmark it lags significantly on is
the RAID5 write test, which correlates with your point about a slow XOR
engine.

The Areca definitely looks to be the fastest with the highest scaling, but
not the fastest at writes either (#3 out of 7, the 3Ware is #4 out of 7).

 I purchased one of the Areca controllers with a large battery-backed
 cache.  It is unholy fast.  I recommend it.

Cool!  Never heard of Areca before this - what did you pay for it?  How many
drives?  What kind of workload do you have?  How do you measure performance?

- Luke 



---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


[HACKERS] audit facilities?

2005-05-31 Thread Satoshi Nagayasu
Hi all,
(B
(BI'm now interested in audit facilities for database accessing.
(BAre anyone thinking or working on audit stuffs?
(B
(B-- 
(BNAGAYASU Satoshi [EMAIL PROTECTED]
(BOpenSource Development Center,
(BNTT DATA Corp. http://www.nttdata.co.jp/
(B
(B---(end of broadcast)---
(BTIP 7: don't forget to increase your free space map settings

Re: [HACKERS] audit facilities?

2005-05-31 Thread Hiroshi Saito
From: "Satoshi Nagayasu" [EMAIL PROTECTED]
(B
(B Hi all,
(B 
(B I'm now interested in audit facilities for database accessing.
(B Are anyone thinking or working on audit stuffs?
(B
(Bhttp://archives.postgresql.org/pgsql-general/2004-08/msg01439.php
(BI remember that the approach which he enjoyed was being done.
(BHowever, I hope variously as a function of the PostgreSQL core.
(B
(Bregards,
(BHiroshi Saito
(B
(B---(end of broadcast)---
(BTIP 9: the planner will ignore your desire to choose an index scan if your
(B  joining column's datatypes do not match

Re: [HACKERS] Deadlock with tsearch2 index ...

2005-05-31 Thread Christopher Kings-Lynne

The real solution is to upgrade GIST to be concurrent.  Oleg and Teodor
have made some noises about that in the past, but nothing's been done
about it that I've heard of.


This whole GiST concurrency think really needs to be looked at :(  There 
is so much cool stuff that can be done with it, and now that there are 
som fantastic uses of it, such as tsearch2 and pg_trgm, people are 
starting to run into these walls :(


WAL logging would also be nice, especially since Tom mentioned in the 
manual that it was just a Small Matter Of Programming :D


Chris


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] Deadlock with tsearch2 index ...

2005-05-31 Thread Christopher Kings-Lynne

unfortunately, we still couldn't find 2-3 months for dedicated work on
concurrencyrecovery for GiST. I'm trying to find support here in Russia
for our work.


How much money (US Dollars) would you need?

Chris


---(end of broadcast)---
TIP 6: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] Deadlock with tsearch2 index ...

2005-05-31 Thread Joshua D. Drake

Christopher Kings-Lynne wrote:

unfortunately, we still couldn't find 2-3 months for dedicated work on
concurrencyrecovery for GiST. I'm trying to find support here in Russia
for our work.



How much money (US Dollars) would you need?



Command Prompt could jump on that as well. We could help sponsor a bit.

Sncerely,

Joshua D. Drake



Chris


---(end of broadcast)---
TIP 6: Have you searched our list archives?

  http://archives.postgresql.org



--
Your PostgreSQL solutions provider, Command Prompt, Inc.
24x7 support - 1.800.492.2240, programming, and consulting
Home of PostgreSQL Replicator, plPHP, plPerlNG and pgPHPToolkit
http://www.commandprompt.com / http://www.postgresql.org

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] Deadlock with tsearch2 index ...

2005-05-31 Thread Neil Conway
On Wed, 2005-06-01 at 09:30 +0800, Christopher Kings-Lynne wrote:
 This whole GiST concurrency think really needs to be looked at :(

I spent some time looking at it toward the end of last year, but
unfortunately I didn't have enough time to devote to it to get a working
implementation (it is quite a lot of work). I'm still interesting in
working on it, although I won't have any time until after 8.1.

-Neil



---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] audit facilities?

2005-05-31 Thread Satoshi Nagayasu
Hiroshi Saito wrote:
(B http://archives.postgresql.org/pgsql-general/2004-08/msg01439.php
(B I remember that the approach which he enjoyed was being done.
(B
(BThanks for the information.
(B
(B However, I hope variously as a function of the PostgreSQL core.
(B
(BI hope they goes into the contrib...
(B
(BI think the pgfoundary is the best place to collect or develop
(Bthe audit stuffs.
(B
(BAnyone interested in this?
(B-- 
(BNAGAYASU Satoshi [EMAIL PROTECTED]
(BOpenSource Development Center,
(BNTT DATA Corp. http://www.nttdata.co.jp/
(B
(B---(end of broadcast)---
(BTIP 7: don't forget to increase your free space map settings

Re: [HACKERS] Deadlock with tsearch2 index ...

2005-05-31 Thread Christopher Kings-Lynne

How much money (US Dollars) would you need?



Command Prompt could jump on that as well. We could help sponsor a bit.


Maybe we could start a funding project for it?

USD convert to lots of roubles I assume, so it'd be good like that. 
Perhaps someone (not me - too busy) on the PostgreSQL Foundation could 
organise contacting some companies to see if we can get a little out of 
several :)


ie. Command Prompt, Fujitsu, Pervasive, EnterpriseDB, etc.?

Oleg  Teodor: how about letting us know what funding target we'd need?

Chris


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] Quick-and-dirty compression for WAL backup blocks

2005-05-31 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes:
 Is this a change that would be backpatched as you suggested previously?

I don't think we can backpatch any of these items, since they involve
changes in the on-disk file format.  I was thinking of them as CVS HEAD
changes only.

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] audit facilities?

2005-05-31 Thread Satoshi Nagayasu
Finally, I found the tablelog project at the pgFoundary...
(B
(Bhttp://pgfoundry.org/projects/tablelog/
(B
(BSatoshi Nagayasu wrote:
(B Hiroshi Saito wrote:
(B 
(Bhttp://archives.postgresql.org/pgsql-general/2004-08/msg01439.php
(BI remember that the approach which he enjoyed was being done.
(B 
(B 
(B Thanks for the information.
(B 
(B 
(BHowever, I hope variously as a function of the PostgreSQL core.
(B 
(B 
(B I hope they goes into the contrib...
(B 
(B I think the pgfoundary is the best place to collect or develop
(B the audit stuffs.
(B 
(B Anyone interested in this?
(B
(B
(B-- 
(BNAGAYASU Satoshi [EMAIL PROTECTED]
(BOpenSource Development Center,
(BNTT DATA Corp. http://www.nttdata.co.jp/
(B
(B---(end of broadcast)---
(BTIP 9: the planner will ignore your desire to choose an index scan if your
(B  joining column's datatypes do not match

Re: [HACKERS] Physical Tlist optimization still possible?

2005-05-31 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes:
 If a table is created as WITHOUT OIDS, is it still possible to take
 advantage of the physical tlist optimization?

Um ... it depends.  There are cases where you have to have an OID,
and cases where you have to not have one.  See ExecContextForcesOids.

regards, tom lane

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] Cost of XLogInsert CRC calculations

2005-05-31 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes:
 Hmmm. I seem to recall asking myself why xl_prev existed if it wasn't
 used, but passed that by. Damn.

I couldn't believe it'd been overlooked this long, either.  It's the
sort of thing that you assume got done the first time :-(

 PreAllocXLog was already a reason to have somebody prepare new xlog
 files ahead of them being used. Surely the right solution here is to
 have that agent prepare fresh/zeroed files prior to them being required.

Uh, why?  That doubles the amount of physical I/O required to maintain
the WAL, and AFAICS it doesn't really add any safety that we can't get
in a more intelligent fashion.

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] NOLOGGING option, or ?

2005-05-31 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes:
 Recent test results have shown a substantial performance improvement
 (+25%) if WAL logging is disabled for large COPY statements.

How much of that is left after we fix the 64-bit-CRC issue?

 Now, I would like to discuss adding an enable_logging USERSET GUC,

[ fear and loathing ... ]

I don't like the idea of a GUC at all, and USERSET is right out.
I think it would have to be system-wide (cf fsync) to be even
implementable let alone somewhat predictable.  Even if it could
be done per-backend with reasonable semantics, random users should
not get to make that decision --- it should be the DBA's call,
which means it needs at least SUSET permissions.

BTW, I'm sure you are the last one who needs to be reminded that
any such thing breaks PITR completely.  Which is surely sufficient
reason not to let it be USERSET.

regards, tom lane

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] Interval-day proposal

2005-05-31 Thread Michael Glaesemann

On May 31, 2005, at 12:48 AM, Tom Lane wrote:




Michael Glaesemann [EMAIL PROTECTED] writes:




 tm_mday is an int value, which is only guaranteed to be 2
bytes (though it may be larger), if I understand correctly.





Actually, practically all of the Postgres code assumes int is at least
32 bits.  Feel free to change pg_tm's field to be declared int32  
instead

of just int if that bothers you, but it is really quite academic.




Thanks for the clarification. My instinct would be to change so that  
it's no longer just an assumption. Is there any benefit to changing  
the other pg_tm int fields to int32? I imagine int is used quite a  
bit throughout the code, and I'd think assuming 32-bit ints would  
have bitten people in the past if it were invalid, so perhaps  
changing them is unnecessary.





I'd make the on-disk field an int32, taking the struct to 16 bytes.




Will do.

Thanks for you comments.

Michael Glaesemann
grzm myrealbox com



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [HACKERS] NOLOGGING option, or ?

2005-05-31 Thread Alvaro Herrera
On Tue, May 31, 2005 at 10:47:30PM -0400, Tom Lane wrote:
 Simon Riggs [EMAIL PROTECTED] writes:
  Recent test results have shown a substantial performance improvement
  (+25%) if WAL logging is disabled for large COPY statements.

 BTW, I'm sure you are the last one who needs to be reminded that
 any such thing breaks PITR completely.  Which is surely sufficient
 reason not to let it be USERSET.

This doesn't work for COPY, but maybe for CREATE TABLE AS we could log
the fact that the command was executed, so the replayer could execute
the same command again.

Of course, this handwaving doesn't explain how the system in recovery
mode would be able to execute a full query to reconstruct the table, and
also it doesn't say a lot about the extra complexity at the source level
to implement this option.

For people loading big files into the database, maybe we could think
about a command to let a file be loaded directly as initial table
content.  So all that we'd need is a program to write the file, which
could be done externally (The filewriter would have to have access to
the catalog and input functions for the involved types, though I think
for simple types it would be straighforward ... we could write frozen
tuples to avoid TransactionId problems.)

-- 
Alvaro Herrera (alvherre[a]surnet.cl)
www.google.com: interfaz de línea de comando para la web.

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] Interval-day proposal

2005-05-31 Thread Michael Glaesemann


On May 31, 2005, at 1:40 AM, Tom Lane wrote:



Josh Berkus josh@agliodbs.com writes:


Unfortunately, it appears that tri-partitioning INTERVAL ( year/ 
month ;
week/day ; hour/minute/second ) is a violation of the SQL spec  
which has only

the two partitions ( year/month ; week/day/hour/minute/second ).




I think it's an extension of the spec, not a violation.  In
particular, if you were working in a daylight-savings-less timezone,
you could not tell the difference (could you?)



I've started working on this change, and one difference has shown up  
immediately in the regression tests. v8.0.3 currently returns:


  SELECT INTERVAL '10 years -11 month -12 days +13:14' AS 9 years...;
  9 years...
  --
!  9 years 1 mon -11 days -10:46:00
  (1 row)

With my first round of changes,

  SELECT INTERVAL '10 years -11 month -12 days +13:14' AS 9 years...;
  9 years...
  --
!  9 years 1 mon -12 days +13:14:00
  (1 row)

These are equivalent in both CVS and my branch, as '1 day'::interval  
= '24 hours'::interval. I haven't checked the SQL spec yet (and  
intend to do so), but is there a canonical form for intervals that we  
need to return? I can imagine there might be, and if the spec  
considers only months and time, I could see where they might want  
days and time to have the same sign, and put results in simplest  
form. Even if the spec doesn't require it, the behavior is  
definitely changed even outside of DST-aware code.


-- v8.0.3
test=# select '9 years 1 mon -11 days -10:46:00'::interval;
 interval
--
9 years 1 mon -11 days -10:46:00
(1 row)

test=# select '9 years 1 mon -12 days +13:14:00'::interval;
 interval
--
9 years 1 mon -11 days -10:46:00
(1 row)

test=# select '25 hours'::interval;
interval

1 day 01:00:00
(1 row)

-- new interval code
test=# select ' 9 years 1 mon -11 days -10:46:00'::interval;
 interval
--
9 years 1 mon -11 days -10:46:00
(1 row)

test=# select '9 years 1 mon -12 days +13:14:00'::interval;
 interval
--
9 years 1 mon -12 days +13:14:00
(1 row)

test=# select '25 hours'::interval;
interval
--
25:00:00
(1 row)

I'll be digging into the spec later and post what I find. Thoughts?

Michael Glaesemann
grzm myrealbox com



---(end of broadcast)---
TIP 6: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] NOLOGGING option, or ?

2005-05-31 Thread Neil Conway
On Wed, 2005-06-01 at 00:40 -0400, Alvaro Herrera wrote:
 This doesn't work for COPY, but maybe for CREATE TABLE AS we could log
 the fact that the command was executed, so the replayer could execute
 the same command again.
 
 Of course, this handwaving doesn't explain how the system in recovery
 mode would be able to execute a full query to reconstruct the table

There's also the typical problem with this kind of approach: how do you
handle non-deterministic queries? (e.g. CREATE TABLE ... AS SELECT
random(), gettimeofday(), some_func(t1.x) FROM t1 LIMIT 5)

-Neil



---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] Interval-day proposal

2005-05-31 Thread Tom Lane
Michael Glaesemann [EMAIL PROTECTED] writes:
 On May 31, 2005, at 12:48 AM, Tom Lane wrote:
 Actually, practically all of the Postgres code assumes int is at least
 32 bits.  Feel free to change pg_tm's field to be declared int32  
 instead
 of just int if that bothers you, but it is really quite academic.

 Thanks for the clarification. My instinct would be to change so that  
 it's no longer just an assumption. Is there any benefit to changing  
 the other pg_tm int fields to int32? I imagine int is used quite a  
 bit throughout the code, and I'd think assuming 32-bit ints would  
 have bitten people in the past if it were invalid, so perhaps  
 changing them is unnecessary.

As I understand it, the received wisdom of the C community is that
int means the machine's natural, most efficient word width.  The
C specification was written at a time when a fair percentage of hardware
thought that meant int16 (and I do remember programming such hardware).
But there are no longer any machines ... or at least none on which you'd
want to run Postgres ... for which int means int16; today I'd assume
that int means probably int32, maybe int64 if that's really faster
on this machine.

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] Deadlock with tsearch2 index ...

2005-05-31 Thread Oleg Bartunov

On Wed, 1 Jun 2005, Christopher Kings-Lynne wrote:


How much money (US Dollars) would you need?



Command Prompt could jump on that as well. We could help sponsor a bit.


Maybe we could start a funding project for it?

USD convert to lots of roubles I assume, so it'd be good like that. Perhaps


What do you mean ? It's common misleading that life in Russia is cheap.

From my experience living in California is cheaper than here in Moscow.

10 years ago it was profitable to work in US some time, but not now :(

someone (not me - too busy) on the PostgreSQL Foundation could organise 
contacting some companies to see if we can get a little out of several :)


ie. Command Prompt, Fujitsu, Pervasive, EnterpriseDB, etc.?

Oleg  Teodor: how about letting us know what funding target we'd need?


We'd love to start tomorrow working on GiST, but we have some current
obligations we need to sort out.



Chris



Regards,
Oleg
_
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq