Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-07 Thread Simon Riggs
On Wed, 2005-07-06 at 17:17 -0700, Joshua D. Drake wrote:
 Tom, I think you're the only person that could or would be trusted to
 make such a change. Even past the 8.1 freeze, I say we need to do
 something now on this issue.
  
  
  I think if we document full_page_writes as similar to fsync in risk, we
  are OK for 8.1, but if something can be done easily, it sounds good.
  
  Now that we have a GUC we can experiment with the full page write load
  and see how it can be improved.
 
 Question, with this option if the power goes out will I just roll 
 through the transaction logs like normal? 

Most probably, yes. But:

 Or are we talking the 
 potential to have to use something like pg_resetxlog or similar?

Potentially. Just depends on what sort of crash occurred...

 If it is just roll through the transaction logs then I have no problem 
 with it, let the user decide the level of reliance they have. If it can
 cause actual, need to restore from backup level damage then it is a 
 literall no go IMHO.

Well, it can't *cause* problems, but it doesn't solve them when they
occur, as the current design does. If crash recovery fails, and it may
do (and worst of all it might not fail when it should have) then you
will need to recover the database using PITR or a replication mechanism.

Best Regards, Simon Riggs



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-07 Thread Simon Riggs
On Thu, 2005-07-07 at 00:29 -0400, Bruce Momjian wrote:
 Tom Lane wrote:
  Bruno Wolff III [EMAIL PROTECTED] writes:
   Are you sure about that? That would probably be the normal case, but are
   you promised that the hardware will write all of the sectors of a block
   in order?
  
  I don't think you can possibly assume that.  If the block crosses a
  cylinder boundary then it's certainly an unsafe assumption, and even
  within a cylinder (no seek required) I'm pretty sure that disk drives
  have understood write the next sector that passes under the heads
  for decades.
 
 SCSI tagged queueing certainly allows 512-byte blocks to be reordered
 during writes.

Then a torn-page tell-tale is required that will tell us of any change
to any of the 512-byte sectors that make up a block/page.

Here's an idea:

We read the page that we would have backed up, calc the CRC and write a
short WAL record with just the CRC, not the block. When we recover we
re-read the database page, calc its CRC and compare it with the CRC from
the transaction log. If they differ, we know that the page was torn and
we know the database needs recovery. (So we calc the CRC when we log AND
when we recover).

This avoids the need to write full pages, though slightly slows down
recovery.

Best Regards, Simon Riggs


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[HACKERS] A couple of patches for PostgreSQL 64bit support

2005-07-07 Thread Koichi Suzuki
Hi, all,

I have posted a couple of patches with regard to 64bit environment
support to PATCHES ml.   It expands size of shared memory to 64bit space
and extends XID to 64bit.   Please take a look at it.

-- 
---
Koichi Suzuki
Open Source Engineeering Departmeent,
NTT DATA Intellilink Corporation
Phone: +81-3-5566-9628  WWW: http://www.intellilink.co.jp
--

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-07 Thread Zeugswetter Andreas DAZ SD

 Are you sure about that? That would probably be the normal case, but 
 are you promised that the hardware will write all of the sectors of a

 block in order?
 
 I don't think you can possibly assume that.  If the block 
 crosses a cylinder boundary then it's certainly an unsafe 
 assumption, and even within a cylinder (no seek required) I'm 
 pretty sure that disk drives have understood write the next 
 sector that passes under the heads
 for decades.

A lot of hardware exists, that guards against partial writes
of single IO requests (a persistent write cache for a HP raid 
controller for intel servers costs ~500$ extra).

But, the OS usually has 4k (some 8k) filesystem buffer size,
and since we do not use direct io for datafiles, the OS might decide 
to schedule two 4k writes differently for one 8k page.

If you do not build pg to match your fs buffer size you cannot
guard against partial writes with hardware :-(

We could alleviate that problem with direct io for datafiles.

Andreas

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-07 Thread Zeugswetter Andreas DAZ SD

 Here's an idea:
 
 We read the page that we would have backed up, calc the CRC and 
 write a short WAL record with just the CRC, not the block. When 
 we recover we re-read the database page, calc its CRC and 
 compare it with the CRC from the transaction log. If they 
 differ, we know that the page was torn and we know the database 
 needs recovery. (So we calc the CRC when we log AND when we recover).

Won't work, since the page on disk may have x different contents
between 2 checkpoints (bgwriter from lru).

Only workable solution would imho be to write the LSN to each 512
byte block (not that I am propagating that idea). 

Andreas

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-07 Thread Tom Lane
Zeugswetter Andreas DAZ SD [EMAIL PROTECTED] writes:
 Only workable solution would imho be to write the LSN to each 512
 byte block (not that I am propagating that idea). 

We're not doing anything like that, as it would create an impossible
space-management problem (or are you happy with limiting tuples to
500 bytes?).  What we *could* do is calculate a page-level CRC and
store it in the page header just before writing out.  Torn pages
would then manifest as a wrong CRC on read.  No correction ability,
but at least a reliable detection ability.

However, this will do nothing to solve the performance problem if
the core of that problem is the cost of computing page-sized CRCs :-(

We still don't know enough about the situation to know what a solution
might look like.  Is the slowdown Josh is seeing due to the extra CPU
cost of the CRCs, or the extra I/O cost, or excessive locking of the
WAL-related data structures while we do this stuff, or ???.  Need more
data.

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-07 Thread Tom Lane
I wrote:
 We still don't know enough about the situation to know what a solution
 might look like.  Is the slowdown Josh is seeing due to the extra CPU
 cost of the CRCs, or the extra I/O cost, or excessive locking of the
 WAL-related data structures while we do this stuff, or ???.  Need more
 data.

Josh, is OSDL up enough that you can try another comparison run?
If so, please undo the previous patch (which disabled page dumping
entirely) and instead try removing this block of code, starting
at about xlog.c line 620 in CVS tip:

/*
 * Now add the backup block headers and data into the CRC
 */
for (i = 0; i  XLR_MAX_BKP_BLOCKS; i++)
{
if (dtbuf_bkp[i])
{
BkpBlock   *bkpb = (dtbuf_xlg[i]);
char   *page;

COMP_CRC32(rdata_crc,
   (char *) bkpb,
   sizeof(BkpBlock));
page = (char *) BufferGetBlock(dtbuf[i]);
if (bkpb-hole_length == 0)
{
COMP_CRC32(rdata_crc,
   page,
   BLCKSZ);
}
else
{
/* must skip the hole */
COMP_CRC32(rdata_crc,
   page,
   bkpb-hole_offset);
COMP_CRC32(rdata_crc,
   page + (bkpb-hole_offset + bkpb-hole_length),
   BLCKSZ - (bkpb-hole_offset + bkpb-hole_length));
}
}
}

This will remove just the CRC calculation work associated with backed-up
pages.  Note that any attempt to recover from the WAL will fail, but I
assume you don't need that for the purposes of the test run.

regards, tom lane

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-07 Thread Zeugswetter Andreas DAZ SD

 Only workable solution would imho be to write the LSN to each 512
byte 
 block (not that I am propagating that idea).

Only workable was a stupid formulation, I meant a solution that works
with
a LSN.

 We're not doing anything like that, as it would create an 
 impossible space-management problem (or are you happy with 
 limiting tuples to 500 bytes?).

To do it, a layer between physical storage and row workmemory
would need to be inserted, of course that would add a lot of overhead.
I guess more overhead than computing a page crc.

 We still don't know enough about the situation to know what a solution
might look like.
 Is the slowdown Josh is seeing due to the extra CPU cost of the CRCs,
or the extra I/O cost,
 or excessive locking of the WAL-related data structures while we do
this stuff, or ???.
 Need more data.

Yes, especially the 10 sec instead of 1 sec response times look very
suspicious.

Andreas

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[HACKERS] windows regression failure - prepared xacts

2005-07-07 Thread Andrew Dunstan


I am consistently seeing the regression failure shown below on my 
Windows machine. See 
http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=lorisdt=2005-07-07%2013:54:13


(On the plus side, I am now building happily and passing regression 
tests with ASPerl, and hope to add ASPython and ASTcl to the list shortly).


cheers

andrew


== pgsql.2072/src/test/regress/regression.diffs 
===
*** ./expected/prepared_xacts.out   Thu Jul  7 09:55:18 2005
--- ./results/prepared_xacts.outThu Jul  7 10:20:37 2005
***
*** 179,189 
 -- Commit table creation
 COMMIT PREPARED 'regress-one';
 \d pxtest2
! Table public.pxtest2
!  Column |  Type   | Modifiers 
! +-+---
!  a  | integer | 
! 
 SELECT * FROM pxtest2;
  a 
 ---

--- 179,185 
 -- Commit table creation
 COMMIT PREPARED 'regress-one';
 \d pxtest2
! ERROR:  cache lookup failed for relation 27240
 SELECT * FROM pxtest2;
  a 
 ---


==






---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] [INTERFACES] By Passed Domain Constraints

2005-07-07 Thread Robert Perry

Tom

Thank you very much.  This sounds like my problem exactly.   I  
personally, feel that the change you have described is the right way  
to go for PostgreSQL. But, since the thing that I expected to work  
does not and would with your suggested change I guess that my opinion  
is pretty predictable.


B.T.W.  Using PQexec instead of PQexecParams also solves my  
problem.  But, that is not a surprise either given your assessment of  
the problem.  Since all of the C++ code in my project ends up calling  
a single function that calls PQexecParams (this was done to  
centralize the conversion of PostgreSQL exceptions to out own  
internal exception classes) I think it is going to be easier for us  
to make this function dynamically build a non parameterized query.   
But, I still appreciate your advice on a work around and I am holding  
it as my plan B.


Thanks again
Robert Perry

On Jul 6, 2005, at 12:05 PM, Tom Lane wrote:


Robert Perry [EMAIL PROTECTED] writes:


 I have also been bitten by the problem you are describing. But,
that one is a problem even when called from psql if I am not
mistaken.  Does psql not use pqlib?  Perhaps it is something about
PQexecParams that is the problem.  I will test in a little while.



[ thinks about it... ]  If you've declared the function input  
parameter

as a domain type and then write a parameterized query like
... function($1) ...
and don't specify any particular datatype for the parameter symbol,
I think the backend will infer the domain type as the parameter type.
Which would also allow bypassing the domain checks.

You could work around this by explicitly specifying the parameter
type as text or varchar or whatever the domain's base type is.
I wonder though if we oughtn't change the backend so that the inferred
type of a parameter symbol is never a domain, but the domain's base
type.  That would force the proper application of CoerceToDomain  
inside

the constructed query parsetree.

regards, tom lane

---(end of  
broadcast)---

TIP 3: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that  
your

   message can get through to the mailing list cleanly




---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


[HACKERS] SQL99 - Nested Tables

2005-07-07 Thread Darren Alcorn
I was interested as to if there were plans to develop SQL99 nested
tables. I know with AJAX(tm) starting to grow in popularity that the
XML features of SQL2003 would prove useful for EnterpriseDB.

I have developed applications using Microsoft SQL Server since 1999. I
have only been programming in C for about a year. I was curious as to
what I might could do to contribute towards nested table support. I
personally think it's needed, and am willing to do whatever I need to
to make it happen.

My first step was to get the latest source and try and understand it some.


Darren Alcorn

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-07 Thread Bruce Momjian
Simon Riggs wrote:
  SCSI tagged queueing certainly allows 512-byte blocks to be reordered
  during writes.
 
 Then a torn-page tell-tale is required that will tell us of any change
 to any of the 512-byte sectors that make up a block/page.
 
 Here's an idea:
 
 We read the page that we would have backed up, calc the CRC and write a
 short WAL record with just the CRC, not the block. When we recover we
 re-read the database page, calc its CRC and compare it with the CRC from
 the transaction log. If they differ, we know that the page was torn and
 we know the database needs recovery. (So we calc the CRC when we log AND
 when we recover).
 
 This avoids the need to write full pages, though slightly slows down
 recovery.

Yes, that is a good idea!  That torn page thing sounded like a mess, and
I love that we can check them on recovery rather than whenever you
happen to access the page.

What would be great would be to implement this when full_page_writes is
off, _and_ have the page writes happen when the page is written to disk
rather than modified in the shared buffers.

I will add those to the TODO list now.  Updated item:

* Eliminate need to write full pages to WAL before page modification 
  [wal]

  Currently, to protect against partial disk page writes, we write
  full page images to WAL before they are modified so we can correct any
  partial page writes during recovery.  These pages can also be
  eliminated from point-in-time archive files.

o  -Add ability to turn off full page writes
o  When off, write CRC to WAL and check file system blocks
   on recovery
o  Write full pages during file system write and not when
   the page is modified in the buffer cache

   This allows most full page writes to happen in the background
   writer.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-07 Thread Tom Lane
Bruce Momjian pgman@candle.pha.pa.us writes:
 Yes, that is a good idea!

... which was shot down in the very next message.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-07 Thread Bruce Momjian
Zeugswetter Andreas DAZ SD wrote:
 
  Are you sure about that? That would probably be the normal case, but 
  are you promised that the hardware will write all of the sectors of a
 
  block in order?
  
  I don't think you can possibly assume that.  If the block 
  crosses a cylinder boundary then it's certainly an unsafe 
  assumption, and even within a cylinder (no seek required) I'm 
  pretty sure that disk drives have understood write the next 
  sector that passes under the heads
  for decades.
 
 A lot of hardware exists, that guards against partial writes
 of single IO requests (a persistent write cache for a HP raid 
 controller for intel servers costs ~500$ extra).
 
 But, the OS usually has 4k (some 8k) filesystem buffer size,
 and since we do not use direct io for datafiles, the OS might decide 
 to schedule two 4k writes differently for one 8k page.
 
 If you do not build pg to match your fs buffer size you cannot
 guard against partial writes with hardware :-(
 
 We could alleviate that problem with direct io for datafiles.

Now that is an interesting analysis.  I thought people who used
batter-backed drive cache wouldn't have partial page write problems, but
I now see it is certainly possible.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-07 Thread Greg Stark
Tom Lane [EMAIL PROTECTED] writes:

 Zeugswetter Andreas DAZ SD [EMAIL PROTECTED] writes:
  Only workable solution would imho be to write the LSN to each 512
  byte block (not that I am propagating that idea). 
 
 We're not doing anything like that, as it would create an impossible
 space-management problem (or are you happy with limiting tuples to
 500 bytes?).  What we *could* do is calculate a page-level CRC and
 store it in the page header just before writing out.  Torn pages
 would then manifest as a wrong CRC on read.  No correction ability,
 but at least a reliable detection ability.

At the same time as you do the CRC you can copy the bytes to a fresh page
skipping the LSNs. Likewise, when writing out the page you have to calculate
the CRC; at the same time as you calculate the CRC you write out the bytes to
a temporary buffer adding LSNs and write that to disk.

This would be zero-copy if you're already scanning the bytes to calculate
the CRC since you can add and remove LSNs at the same time. It does require an
extra buffer to store the page in before writing and that entails some amount
of cache thrashing. But maybe you could reuse the same buffer over and over
again for every read/write.

-- 
greg


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-07 Thread Tom Lane
Greg Stark [EMAIL PROTECTED] writes:
 Tom Lane [EMAIL PROTECTED] writes:
 What we *could* do is calculate a page-level CRC and
 store it in the page header just before writing out.  Torn pages
 would then manifest as a wrong CRC on read.  No correction ability,
 but at least a reliable detection ability.

 At the same time as you do the CRC you can copy the bytes to a fresh page
 skipping the LSNs. Likewise, when writing out the page you have to calculate
 the CRC; at the same time as you calculate the CRC you write out the bytes to
 a temporary buffer adding LSNs and write that to disk.

Huh?  You seem to be proposing doing *both* things, which sounds entirely
pointless.

BTW, I was envisioning the page CRCs as something we'd only check during
crash recovery, not normal-operation reads.

regards, tom lane

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-07 Thread Bruce Momjian
Tom Lane wrote:
 Bruce Momjian pgman@candle.pha.pa.us writes:
  As far as #2, my posted proposal was to write the full pages to WAL when
  they are written to the file system, and not when they are first
  modified in the shared buffers ---
 
 That is *completely* unworkable.  Or were you planning to abandon the
 promise that a transaction is committed when we have flushed its WAL
 commit record?

completely is a strong word.

What is on disk at the time the page is modified in the shared buffer is
just fine for recovery (it is the same as what we write to WAL anyway). 
It is just when the page gets written to disk that it changes for
recovery, so potentially during recovery you could take what is on disk,
modify it by reading WAL, then replace it later with the image from WAL.

The only problem I see is that the page might be partially written and
WAL modifications to the page might fail but later the page will be
replaced anyway.  Perhaps we could record pages that are corrupt
(hopefully only one) and make sure later page images replace them, or we
fail on recovery.  

  Seems it is similar to fsync in risk, which is not a new option.
 
 The point here is that fsync-off is only realistic for development
 or playpen installations.  You don't turn it off in a production
 machine, and I can't see that you'd turn off the full-page-write
 option either.  So we have not solved anyone's performance problem.

Yes, this is basically another fsync-like option that isn't for
production usage in most cases.  Sad but true.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-07 Thread Tom Lane
Bruce Momjian pgman@candle.pha.pa.us writes:
 Tom Lane wrote:
 The point here is that fsync-off is only realistic for development
 or playpen installations.  You don't turn it off in a production
 machine, and I can't see that you'd turn off the full-page-write
 option either.  So we have not solved anyone's performance problem.

 Yes, this is basically another fsync-like option that isn't for
 production usage in most cases.  Sad but true.

Just to make my position perfectly clear: I don't want to see this
option shipped in 8.1.  It's reasonable to have it in there for now
as an aid to our performance investigations, but I don't see that it
has any value for production.

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-07 Thread Bruce Momjian
Tom Lane wrote:
 Bruce Momjian pgman@candle.pha.pa.us writes:
  Tom Lane wrote:
  The point here is that fsync-off is only realistic for development
  or playpen installations.  You don't turn it off in a production
  machine, and I can't see that you'd turn off the full-page-write
  option either.  So we have not solved anyone's performance problem.
 
  Yes, this is basically another fsync-like option that isn't for
  production usage in most cases.  Sad but true.
 
 Just to make my position perfectly clear: I don't want to see this
 option shipped in 8.1.  It's reasonable to have it in there for now
 as an aid to our performance investigations, but I don't see that it
 has any value for production.

Well, this is the first I am hearing that, and of course your position
is just one vote.

One idea would be to just tie its behavior directly to fsync and remove
the option completely (that was the original TODO), or we can adjust it
so it doesn't have the same risks as fsync, or the same lack of failure
reporting as fsync.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-07 Thread Joshua D. Drake



Just to make my position perfectly clear: I don't want to see this
option shipped in 8.1.  It's reasonable to have it in there for now
as an aid to our performance investigations, but I don't see that it
has any value for production.



Well, this is the first I am hearing that, and of course your position
is just one vote.


True but your feature was added after feature freeze ;). I don't see
this as a good thing overall. We should be looking for a solution not a 
band-aid that if you tear it off will pull the skin.


Sincerely,

Joshua D. Drake


--
Your PostgreSQL solutions company - Command Prompt, Inc. 1.800.492.2240
PostgreSQL Replication, Consulting, Custom Programming, 24x7 support
Managed Services, Shared and Dedicated Hosting
Co-Authors: plPHP, plPerlNG - http://www.commandprompt.com/

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-07 Thread Bruce Momjian
Joshua D. Drake wrote:
 
 Just to make my position perfectly clear: I don't want to see this
 option shipped in 8.1.  It's reasonable to have it in there for now
 as an aid to our performance investigations, but I don't see that it
 has any value for production.
  
  
  Well, this is the first I am hearing that, and of course your position
  is just one vote.
 
 True but your feature was added after feature freeze ;). I don't see

My patch was posted days before the feature freeze.

 this as a good thing overall. We should be looking for a solution not a 
 band-aid that if you tear it off will pull the skin.

Sure, having it be _like_ fsync is not a good thing.  Seems we can roll
it into the fsync option, improve it, or remove it.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-07 Thread Bruce Momjian
Simon Riggs wrote:
 On Wed, 2005-07-06 at 18:22 -0400, Bruce Momjian wrote:
  Well, I added #1 yesterday as 'full_page_writes', and it has the same
  warnings as fsync (namely, on crash, be prepared to recovery or check
  your system thoroughly.
 
 Yes, which is why I comment now that the GUC alone is not enough.
 
 There is no way to check your system thoroughly. If there is a certain
 way of knowing torn pages had *not* occurred, then I would be happy.

Yep, it is a pain, and like fsync.

  As far as #2, my posted proposal was to write the full pages to WAL when
  they are written to the file system, and not when they are first
  modified in the shared buffers --- the goal being that it will even out
  the load, and it will happen in a non-critical path, hopefully by the
  background writer or at checkpoint time.
 
 The page must be written before the changes to the page are written, so
 that they are available sequentially in the log for replay. The log and
 the database are not connected, so we cannot do it that way. If the page
 is written out of sequence from the changes to it, how would recovery
 know where to get the page from?

See my later email --- the full page will be restored later from WAL, so
our changes don't have to be made at that point.

 ISTM there is mileage in your idea of trying to shift the work to
 another time. My thought is which blocks exactly are the ones being
 changed?. Maybe that would lead to a reduction.
 
   With wal_changed_pages= off *any* crash would possibly require an
   archive recovery, or a replication rebuild. It's good that we now have
   PITR, but we do also have other options for availability. Users of
   replication could well be amongst the first to try out this option. 
  
  Seems it is similar to fsync in risk, which is not a new option.
 
 Risk is not acceptable. We must have certainty, either way.
 
 Why have two GUCs? Why not just have one GUC that does both at the same
 time? When would you want one but not the other?
 risk_data_loss_to_gain_performance = true

Yep, one new one might make sense.

  I think if we document full_page_writes as similar to fsync in risk, we
  are OK for 8.1, but if something can be done easily, it sounds good.
 
 Documenting something simply isn't enough. I simply cannot advise
 anybody ever to use the new GUC. If their data was low value, they
 wouldn't even be using PostgreSQL, they'd use a non-transactional DBMS.
 
 I agree we *must* have the GUC, but we also *must* have a way for crash
 recovery to tell us for certain that it has definitely worked, not just
 maybe worked.

Right.  I am thinking your CRC write to WAL might do that.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


[HACKERS] Must be owner to truncate?

2005-07-07 Thread Stephen Frost
Greetings,

  The current permissions checks for truncate seem to be excessive.  It
  requires that you're the owner of the relation instead of requiring
  that you have delete permissions on the relation.  It was pointed out
  that truncate doesn't call triggers but it seems like that would be
  something easy enough to check for.  My thinking is to replace the
  existing ownercheck with:

  Must have delete permissions on the relation
  If the relation has triggers:
Check that the caller is the owner of the relation, if so, then
issue a NOTICE that those triggers won't be called and perform the
truncate.
If not the owner, then error out saying there are ON DELETE triggers
and that you're not the owner.

  I can submit a patch for this today if there's general agreement on
  this change.  An alternative that was mentioned was to make 'delete'
  smart enough to know when it's delete'ing all the rows and there
  aren't any triggers on it, etc, to perform like truncate, perhaps
  leaving the old file around until all transactions using it have
  finished.  This sounds like a good idea but also sounds like it'd be a
  larger change and might have to wait till 8.2.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] SQL99 - Nested Tables

2005-07-07 Thread Josh Berkus
Darren,

 I was interested as to if there were plans to develop SQL99 nested
 tables. I know with AJAX(tm) starting to grow in popularity that the
 XML features of SQL2003 would prove useful for EnterpriseDB.

I realize that nested tables are in SQL99, but so is SQLJ and a few other 
really dumb ideas.  As Joe Celko (member of the SQL92 committee) put it, we 
were a committee, and a committee never met a feature it didn't like.

Can you make a use-case for nested tables?   It would need to be a fairly 
strong one, given that they basically violate the relational model.  If what 
you're really interested in is XML data support, then I'd suggest focusing on 
that instead.

Of course, that's just my opinion.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] A couple of patches for PostgreSQL 64bit support

2005-07-07 Thread Josh Berkus
Koichi,

 I have posted a couple of patches with regard to 64bit environment
 support to PATCHES ml.   It expands size of shared memory to 64bit space
 and extends XID to 64bit.   Please take a look at it.

In case you weren't aware, feature freeze was last Friday.   So your patch is 
liable to remain in the queue for a while before anyone looks at it.

Incidentally, what about 64-bit support for work_mem and maintenance_work_mem?  
64bit shared_mem support isn't that needed *yet* (I've yet to see a server 
use more than 500mb of the shared_mem) but being able to allocate 6GB to 
index creation would be very useful.

I take it extending the XID to 64bit is intended to postpone the need for 
vacuuming even in a high-activity database?Have you tested whether there 
are any performance effects?

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-07 Thread Josh Berkus
Tom,

 Josh, is OSDL up enough that you can try another comparison run?

Thankfully, yes.

 If so, please undo the previous patch (which disabled page dumping
 entirely) and instead try removing this block of code, starting
 at about xlog.c line 620 in CVS tip:

Will do.  Results in a few days.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Must be owner to truncate?

2005-07-07 Thread Tom Lane
Stephen Frost [EMAIL PROTECTED] writes:
   The current permissions checks for truncate seem to be excessive.  It
   requires that you're the owner of the relation instead of requiring
   that you have delete permissions on the relation.  It was pointed out
   that truncate doesn't call triggers but it seems like that would be
   something easy enough to check for.

There are other reasons for restricting it:
 * truncate takes a much stronger lock than a plain delete does.
 * truncate is not MVCC-safe.

I don't really agree with the viewpoint that truncate is just a quick
DELETE, and so I do not agree that DELETE permissions should be enough
to let you do a TRUNCATE.

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] Must be owner to truncate?

2005-07-07 Thread Stephen Frost
* Tom Lane ([EMAIL PROTECTED]) wrote:
 Stephen Frost [EMAIL PROTECTED] writes:
The current permissions checks for truncate seem to be excessive.  It
requires that you're the owner of the relation instead of requiring
that you have delete permissions on the relation.  It was pointed out
that truncate doesn't call triggers but it seems like that would be
something easy enough to check for.
 
 There are other reasons for restricting it:
  * truncate takes a much stronger lock than a plain delete does.

What permissions are required to lock a table?  With just select,
insert, update and delete on a table I can LOCK TABLE it, which acquires
an ACCESS EXCLUSIVE on it and will therefore hold off anyone else from
using the table till the end of my transaction anyway.  So I don't see
this as being a reason to disallow non-owners use of truncate.

  * truncate is not MVCC-safe.

Erm, that's why it gets a stronger lock, so I don't really see what
this has to do with it.

 I don't really agree with the viewpoint that truncate is just a quick
 DELETE, and so I do not agree that DELETE permissions should be enough
 to let you do a TRUNCATE.

Truncate is exactly a quick DELETE, in fact, DELETE could stand to learn
some thing from truncate to make it suck a little less to 
'delete from x;' when x is a reasonably large table.  This probably
wouldn't actually be all that difficult to do if there's a way to keep
the old file around until all the transactions using it have completed
that's not too expensive, etc.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] SQL99 - Nested Tables

2005-07-07 Thread David Fetter
On Thu, Jul 07, 2005 at 09:52:44AM -0700, Josh Berkus wrote:
 Darren,
 
  I was interested as to if there were plans to develop SQL99 nested
  tables. I know with AJAX(tm) starting to grow in popularity that
  the XML features of SQL2003 would prove useful for EnterpriseDB.
 
 I realize that nested tables are in SQL99, but so is SQLJ and a few
 other really dumb ideas.  As Joe Celko (member of the SQL92
 committee) put it, we were a committee, and a committee never met a
 feature it didn't like.
 
 Can you make a use-case for nested tables?

Now that the time travel option is gone, this is a way of implementing
it.

 It would need to be a fairly strong one, given that they basically
 violate the relational model.  If what you're really interested in
 is XML data support, then I'd suggest focusing on that instead.

XML data support would be best if coordinated.  There are several
different projects right now, and one more would be bad.

 Of course, that's just my opinion.

I think it would be kinda nice if a column type could be anything,
including SETOF some_composite_type.  If I recall correctly, Illustra
had this, and it was a pretty cool feature :)  It might also help
implement the SQL:2003 standard for UNNEST, MULTISET, FUSION, COLLECT
and INTERSECT.

In re: the relational model, that's already broken, although Date's
excellent new book from O'Reilly encourages people to implement
Tutorial D. :)

Cheers,
D
-- 
David Fetter [EMAIL PROTECTED] http://fetter.org/
phone: +1 510 893 6100   mobile: +1 415 235 3778

Remember to vote!

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Must be owner to truncate?

2005-07-07 Thread Andrew - Supernews
On 2005-07-07, Stephen Frost [EMAIL PROTECTED] wrote:

  * truncate is not MVCC-safe.

 Erm, that's why it gets a stronger lock, so I don't really see what
 this has to do with it.

It's not MVCC-safe even with the AccessExclusive lock; it damages snapshots
that were taken before the truncate operation but which don't have a lock
on the table yet. The only reason it doesn't break pg_dump is that the
first thing that pg_dump does is to take AccessShare locks on every table
that it's going to dump.

-- 
Andrew, Supernews
http://www.supernews.com - individual and corporate NNTP services

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-07 Thread Kenneth Marshall
On Thu, Jul 07, 2005 at 11:36:40AM -0400, Tom Lane wrote:
 Greg Stark [EMAIL PROTECTED] writes:
  Tom Lane [EMAIL PROTECTED] writes:
  What we *could* do is calculate a page-level CRC and
  store it in the page header just before writing out.  Torn pages
  would then manifest as a wrong CRC on read.  No correction ability,
  but at least a reliable detection ability.
 
  At the same time as you do the CRC you can copy the bytes to a fresh page
  skipping the LSNs. Likewise, when writing out the page you have to calculate
  the CRC; at the same time as you calculate the CRC you write out the bytes 
  to
  a temporary buffer adding LSNs and write that to disk.
 
 Huh?  You seem to be proposing doing *both* things, which sounds entirely
 pointless.
 
 BTW, I was envisioning the page CRCs as something we'd only check during
 crash recovery, not normal-operation reads.
 
   regards, tom lane
 
Does the DB page on disk have a version number? If so, maybe we could
update the WAL with the CRC+version anytime the page is update. You may
need to check the log for multiple CRC+version entries to determine the
torn-page status.

Ken

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] SQL99 - Nested Tables

2005-07-07 Thread Darren Alcorn
I was mainly interested because of the simplicity it seems to add for implementing an application using the database. While those accustomed to writing SQL queries using joins and keys might prefer it for many understandable reasons, there is something to be said for multidimensional data structures. It would be like if you _had_ to have multiple arrays to store information in C instead of using a multidimensional array.

I'm open to debate on the subject as I'd love to be convinced that Oracle is wrong. I think the XML features are important and I'd be more suited writing something more straight forward versus re-inventing the wheel. I brought it up for debate, because I thought it was something that should be thought about.


Darren Alcorn

On 7/7/05, Josh Berkus josh@agliodbs.com wrote:
Darren, I was interested as to if there were plans to develop SQL99 nested tables. I know with AJAX(tm) starting to grow in popularity that the
 XML features of SQL2003 would prove useful for EnterpriseDB.I realize that nested tables are in SQL99, but so is SQLJ and a few otherreally dumb ideas.As Joe Celko (member of the SQL92 committee) put it, we
were a committee, and a committee never met a feature it didn't like.Can you make a use-case for nested tables? It would need to be a fairlystrong one, given that they basically violate the relational model.If what
you're really interested in is XML data support, then I'd suggest focusing onthat instead.Of course, that's just my opinion.--Josh BerkusAglio Database SolutionsSan Francisco



[HACKERS] 8.1dev ssl broke?

2005-07-07 Thread Joshua D. Drake

I pulled cvs today and performed the following:

./configure --with-openssl --prefix=/tmp/pgsqldev
make install
cd /tmp/pgsqldev
initdb --no-locale -D data -l data/serverlog

pg_hba.conf lines:

# local is for Unix domain socket connections only
#local   all all   trust
# IPv4 local connections:
hostssl   all all 127.0.0.1/32trust
# IPv6 local connections:
hostall all ::1/128   trust

postgresql.conf

listen_addresses = 'localhost'  # what IP interface(s) to listen on;
# defaults to localhost, '*' = any
port = 5432
ssl = on


When I try to connect I get:


LOG:  redo record is at 0/34D73C; undo record is at 0/0; shutdown TRUE
LOG:  next transaction ID: 561; next OID: 16385
LOG:  next MultiXactId: 1; next MultiXactOffset: 0
LOG:  database system is ready
LOG:  transaction ID wrap limit is 2147484132, limited by database 
postgres
LOG:  invalid entry in file /tmp/pgsqldev/data/pg_hba.conf at line 73, 
token hostssl

FATAL:  missing or erroneous pg_hba.conf file
HINT:  See server log for details.
DEBUG:  forked new backend, pid=26717 socket=6
DEBUG:  server process (PID 26717) exited with exit code 0

If I change the entries to:

# local is for Unix domain socket connections only
#local   all all   trust
# IPv4 local connections:
host   all all 127.0.0.1/32trust
# IPv6 local connections:
hostall all ::1/128   trust

Everything works fine.

Sincerely,

Joshua D. Drake


--
Your PostgreSQL solutions company - Command Prompt, Inc. 1.800.492.2240
PostgreSQL Replication, Consulting, Custom Programming, 24x7 support
Managed Services, Shared and Dedicated Hosting
Co-Authors: plPHP, plPerlNG - http://www.commandprompt.com/

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] 8.1dev ssl broke?

2005-07-07 Thread Joshua D. Drake

FYI: I also followed the instructions per:

http://developer.postgresql.org/docs/postgres/ssl-tcp.html

Joshua D. Drake wrote:

I pulled cvs today and performed the following:

./configure --with-openssl --prefix=/tmp/pgsqldev
make install
cd /tmp/pgsqldev
initdb --no-locale -D data -l data/serverlog

pg_hba.conf lines:

# local is for Unix domain socket connections only
#local   all all   trust
# IPv4 local connections:
hostssl   all all 127.0.0.1/32trust
# IPv6 local connections:
hostall all ::1/128   trust

postgresql.conf

listen_addresses = 'localhost'  # what IP interface(s) to listen on;
# defaults to localhost, '*' = any
port = 5432
ssl = on


When I try to connect I get:


LOG:  redo record is at 0/34D73C; undo record is at 0/0; shutdown TRUE
LOG:  next transaction ID: 561; next OID: 16385
LOG:  next MultiXactId: 1; next MultiXactOffset: 0
LOG:  database system is ready
LOG:  transaction ID wrap limit is 2147484132, limited by database 
postgres
LOG:  invalid entry in file /tmp/pgsqldev/data/pg_hba.conf at line 73, 
token hostssl

FATAL:  missing or erroneous pg_hba.conf file
HINT:  See server log for details.
DEBUG:  forked new backend, pid=26717 socket=6
DEBUG:  server process (PID 26717) exited with exit code 0

If I change the entries to:

# local is for Unix domain socket connections only
#local   all all   trust
# IPv4 local connections:
host   all all 127.0.0.1/32trust
# IPv6 local connections:
hostall all ::1/128   trust

Everything works fine.

Sincerely,

Joshua D. Drake





--
Your PostgreSQL solutions company - Command Prompt, Inc. 1.800.492.2240
PostgreSQL Replication, Consulting, Custom Programming, 24x7 support
Managed Services, Shared and Dedicated Hosting
Co-Authors: plPHP, plPerlNG - http://www.commandprompt.com/

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] SQL99 - Nested Tables

2005-07-07 Thread Josh Berkus
Darren,

 I was mainly interested because of the simplicity it seems to add for
 implementing an application using the database. While those accustomed
 to writing SQL queries using joins and keys might prefer it for many
 understandable reasons, there is something to be said for
 multidimensional data structures. It would be like if you _had_ to have
 multiple arrays to store information in C instead of using a
 multidimensional array. I'm open to debate on the subject as I'd love to
 be convinced that Oracle is wrong.

Ooops.  Our discussion somehow got shifted off list.  Suffice it to say 
that not everyone agrees with me.

 I think the XML features are 
 important and I'd be more suited writing something more straight forward
 versus re-inventing the wheel. I brought it up for debate, because I
 thought it was something that should be thought about.

Yes, I just don't see how nested tables relate to XML.

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] 8.1dev ssl broke?

2005-07-07 Thread Michael Fuhr
On Thu, Jul 07, 2005 at 12:30:35PM -0700, Joshua D. Drake wrote:
 I pulled cvs today and performed the following:
 
 ./configure --with-openssl --prefix=/tmp/pgsqldev

Did the build actually find OpenSSL?  Does ldd postgres show it
linked against libcrypto and libssl (I'm assuming those are shared
libraries on your system)?

 LOG:  invalid entry in file /tmp/pgsqldev/data/pg_hba.conf at line 73, 
 token hostssl

No problems here with the latest HEAD.  Is it possible that you're
running a non-SSL enabled postmaster, either because the build didn't
find OpenSSL or because the postmaster you ran is from a different
build?

-- 
Michael Fuhr
http://www.fuhr.org/~mfuhr/

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] 8.1dev ssl broke?

2005-07-07 Thread Joshua D. Drake

Michael Fuhr wrote:

On Thu, Jul 07, 2005 at 12:30:35PM -0700, Joshua D. Drake wrote:


I pulled cvs today and performed the following:

./configure --with-openssl --prefix=/tmp/pgsqldev



Did the build actually find OpenSSL?  Does ldd postgres show it
linked against libcrypto and libssl (I'm assuming those are shared
libraries on your system)?


Bingo... I didn't think about the ldd. **notes** make clean before
recompiling ;)

Sincerely,

Joshua D. Drake




--
Your PostgreSQL solutions company - Command Prompt, Inc. 1.800.492.2240
PostgreSQL Replication, Consulting, Custom Programming, 24x7 support
Managed Services, Shared and Dedicated Hosting
Co-Authors: plPHP, plPerlNG - http://www.commandprompt.com/

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


[HACKERS] no subject

2005-07-07 Thread Ayush Parashar




---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Must be owner to truncate?

2005-07-07 Thread Jim C. Nasby
On Thu, Jul 07, 2005 at 01:48:59PM -0400, Tom Lane wrote:
 Stephen Frost [EMAIL PROTECTED] writes:
The current permissions checks for truncate seem to be excessive.  It
requires that you're the owner of the relation instead of requiring
that you have delete permissions on the relation.  It was pointed out
that truncate doesn't call triggers but it seems like that would be
something easy enough to check for.
 
 There are other reasons for restricting it:
  * truncate takes a much stronger lock than a plain delete does.
  * truncate is not MVCC-safe.
 
 I don't really agree with the viewpoint that truncate is just a quick
 DELETE, and so I do not agree that DELETE permissions should be enough
 to let you do a TRUNCATE.

What about adding a truncate permission? I would find it useful, as it
seems would others.
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


[HACKERS] Multi-byte and client side character encoding tests for copy command..

2005-07-07 Thread Ayush Parashar
Hi there,

I am new to this list.

I have made some additions to multi-byte regression tests ( ./src/test/mb),
to include regression tests for copy command. This can be used to test
multi-byte extensions of postgresql and client character encoding, for copy
command. The test uses the following complete character sets: big5 gbk uhc
gb18030 euc_cn latin8 iso_8859_6 iso_8859_7 euc_kr euc_jp.

I am attaching only the modified README and mbregress.sh with the email. I
have not included the various character sets with the email as they are
nearly 1MB. 

Can you please comment on this..?

I can provide a patch for the same, which includes the character sets.

Thanks,
Ayush

ps: sorry for the previous blank email, that was by mistake.



README
Description: Binary data


mbregress.sh
Description: Binary data

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Must be owner to truncate?

2005-07-07 Thread Stephen Frost
* Jim C. Nasby ([EMAIL PROTECTED]) wrote:
 On Thu, Jul 07, 2005 at 01:48:59PM -0400, Tom Lane wrote:
  I don't really agree with the viewpoint that truncate is just a quick
  DELETE, and so I do not agree that DELETE permissions should be enough
  to let you do a TRUNCATE.
 
 What about adding a truncate permission? I would find it useful, as it
 seems would others.

That would be acceptable for me as well.  I'd prefer it just work off
delete, but as long as I can grant truncate to someone w/o giving them
ownership rights on the table I'd be happy.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Must be owner to truncate?

2005-07-07 Thread Stephen Frost
* Andrew - Supernews ([EMAIL PROTECTED]) wrote:
 On 2005-07-07, Stephen Frost [EMAIL PROTECTED] wrote:
 
   * truncate is not MVCC-safe.
 
  Erm, that's why it gets a stronger lock, so I don't really see what
  this has to do with it.
 
 It's not MVCC-safe even with the AccessExclusive lock; it damages snapshots
 that were taken before the truncate operation but which don't have a lock
 on the table yet. The only reason it doesn't break pg_dump is that the
 first thing that pg_dump does is to take AccessShare locks on every table
 that it's going to dump.

This seems like something which should probably be fixed, but which is
probably too late to fix for 8.1.  Of course, if we could fix this then
it seems like it would be possible for us to just change 'delete from x'
to behave as truncate does now given appropriate conditions.  I'm not as
familiar with that area as others are; is this a very difficult thing to
do?  If not then I may take a look at it, it'd be a very nice
improvement.

Thanks,

Stephen


signature.asc
Description: Digital signature


[HACKERS] Pg_autovacuum on FreeBSD

2005-07-07 Thread Rodrigo Moreno
Hi All,

The pg_autovacuum on FreeBSD and pg 803 is not working. Just do nothing, no
log, nothing in screen, no daemonize.

It was ok on pg746.

Could some one help me ?

Best Regards
Rodrigo Moreno


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] SQL99 - Nested Tables

2005-07-07 Thread David Fetter
On Thu, Jul 07, 2005 at 12:53:14PM -0700, Josh Berkus wrote:
 Darren,
 
  I was mainly interested because of the simplicity it seems to add for
  implementing an application using the database. While those accustomed
  to writing SQL queries using joins and keys might prefer it for many
  understandable reasons, there is something to be said for
  multidimensional data structures. It would be like if you _had_ to have
  multiple arrays to store information in C instead of using a
  multidimensional array. I'm open to debate on the subject as I'd love to
  be convinced that Oracle is wrong.
 
 Ooops.  Our discussion somehow got shifted off list.  Suffice it to say 
 that not everyone agrees with me.

Where not everyone includes one C. J. Date ;)

  I think the XML features are important and I'd be more suited
  writing something more straight forward versus re-inventing the
  wheel. I brought it up for debate, because I thought it was
  something that should be thought about.
 
 Yes, I just don't see how nested tables relate to XML.

To me, they don't relate directly, as tables (nested or otherwise)
have no intrinsic row ordering, where XML does.  Nested tables is a
Good Thing(TM) though :)

Cheers,
D
-- 
David Fetter [EMAIL PROTECTED] http://fetter.org/
phone: +1 510 893 6100   mobile: +1 415 235 3778

Remember to vote!

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[HACKERS] temp_buffers

2005-07-07 Thread Joshua D. Drake

Hello,

Can someone give me a reasonable explanation of what temp_buffers is for?

Sincerely,

Joshua D. Drake

--
Your PostgreSQL solutions company - Command Prompt, Inc. 1.800.492.2240
PostgreSQL Replication, Consulting, Custom Programming, 24x7 support
Managed Services, Shared and Dedicated Hosting
Co-Authors: plPHP, plPerlNG - http://www.commandprompt.com/

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] temp_buffers

2005-07-07 Thread Alvaro Herrera
On Thu, Jul 07, 2005 at 04:16:58PM -0700, Joshua D. Drake wrote:

 Can someone give me a reasonable explanation of what temp_buffers is for?

Number of buffers to be used for temp tables.  Think shared_buffers, but
local to a connection instead of shared.

They are also used for new relations, in the transaction they are being
created.  Because no one else can see the relation, there's no point in
sharing their pages.

-- 
Alvaro Herrera (alvherre[a]alvh.no-ip.org)
¿Que diferencia tiene para los muertos, los huérfanos, y aquellos que han
perdido su hogar, si la loca destrucción ha sido realizada bajo el nombre
del totalitarismo o del santo nombre de la libertad y la democracia? (Gandhi)

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] SQL99 - Nested Tables

2005-07-07 Thread Josh Berkus
David,

Nested tables is a
 Good Thing(TM) though :)

Hmmm.   I don't buy that they are an a priori Good Thing.  What are they 
good for?  I can't think of a single occasion in my 12-year database 
career where I found myself wanting one.

Seems to me that, SQL standard or not, nested tables are just a jumbo-sized 
foot gun for new DBAs.  And it's not like they'll be trivial to implement.

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] SQL99 - Nested Tables

2005-07-07 Thread David Fetter
On Thu, Jul 07, 2005 at 05:14:27PM -0700, Josh Berkus wrote:
 David,
 
 Nested tables is a Good Thing(TM) though :)
 
 Hmmm.   I don't buy that they are an a priori Good Thing.  What are
 they good for?  I can't think of a single occasion in my 12-year
 database career where I found myself wanting one.

If they'd been available when you started, you might find them
indispensable. :)

 Seems to me that, SQL standard or not, nested tables are just a
 jumbo-sized foot gun for new DBAs.

There are quite a few things in PostgreSQL than have already been
implemented which can serve this function.  Ask a new DBA what she
expects table inheritance to do or not to do, for example.

 And it's not like they'll be trivial to implement.

Probably not, but implementing them will very likely open the door to
implementing other parts of the SQL standard that really are fun. :)

Cheers,
D
-- 
David Fetter [EMAIL PROTECTED] http://fetter.org/
phone: +1 510 893 6100   mobile: +1 415 235 3778

Remember to vote!

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Pg_autovacuum on FreeBSD

2005-07-07 Thread Mark Kirkwood

Rodrigo Moreno wrote:

Hi All,

The pg_autovacuum on FreeBSD and pg 803 is not working. Just do nothing, no
log, nothing in screen, no daemonize.

It was ok on pg746.

Could some one help me ?



What version of FreeBSD are you running?

Mark

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] Must be owner to truncate?

2005-07-07 Thread Christopher Kings-Lynne

  The current permissions checks for truncate seem to be excessive.  It
  requires that you're the owner of the relation instead of requiring
  that you have delete permissions on the relation.  It was pointed out
  that truncate doesn't call triggers but it seems like that would be
  something easy enough to check for.  My thinking is to replace the
  existing ownercheck with:

  Must have delete permissions on the relation
  If the relation has triggers:
Check that the caller is the owner of the relation, if so, then
issue a NOTICE that those triggers won't be called and perform the
truncate.
If not the owner, then error out saying there are ON DELETE triggers
and that you're not the owner.


I'm strongly in favour of this patch.  I am currently in this situation:

1. Web db user runs as non-superuser, non-owner.
2. I have a table of a tens of thousands of rows that I must delete 
entirely and rebuild every day at least (pg_trgm word list)

3. It just gets slow over time, even with autovac.
4. I can't vacuum it as i'm not the owner, and I cannot truncate it either.
5. Table has no triggers or FK's whatsoever.

So, stephen frost's suggestion would be fantastic.

Chris


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] Must be owner to truncate?

2005-07-07 Thread Christopher Kings-Lynne

There are other reasons for restricting it:
 * truncate takes a much stronger lock than a plain delete does.
 * truncate is not MVCC-safe.

I don't really agree with the viewpoint that truncate is just a quick
DELETE, and so I do not agree that DELETE permissions should be enough
to let you do a TRUNCATE.


Ah.  I didn't realise that 2nd point.  I don't care so much about the 
stronger lock in my application.


Chris


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Pg_autovacuum on FreeBSD

2005-07-07 Thread Christopher Kings-Lynne

The pg_autovacuum on FreeBSD and pg 803 is not working. Just do nothing, no
log, nothing in screen, no daemonize.

It was ok on pg746.

Could some one help me ?


They both work fine for me on my test box...

Are you aware that they change the port?  You need to put 
postgresql=YES in your /etc/rc.conf.


Chris


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[HACKERS] Hmmm 8.1 pg_dumpall cannot dump older db's?

2005-07-07 Thread Christopher Kings-Lynne

I can't seem to dump old db's:

-bash-2.05b$ pg_dumpall -s -h database-dev  dump.sql
Password:
pg_dumpall: could not connect to database postgres: FATAL:  database 
postgres does not exist


Seems that it is expecting the new 'postgres' database to exist on old 
installations?


Chris


---(end of broadcast)---
TIP 6: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] Must be owner to truncate?

2005-07-07 Thread Tom Lane
Stephen Frost [EMAIL PROTECTED] writes:
 * Andrew - Supernews ([EMAIL PROTECTED]) wrote:
 It's not MVCC-safe even with the AccessExclusive lock;

 This seems like something which should probably be fixed,

You've missed the point entirely: this *cannot* be fixed, at least not
without giving up the performance advantages that make TRUNCATE
interesting.

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] temp_buffers

2005-07-07 Thread Tom Lane
Alvaro Herrera [EMAIL PROTECTED] writes:
 On Thu, Jul 07, 2005 at 04:16:58PM -0700, Joshua D. Drake wrote:
 Can someone give me a reasonable explanation of what temp_buffers is for?

 Number of buffers to be used for temp tables.  Think shared_buffers, but
 local to a connection instead of shared.

 They are also used for new relations, in the transaction they are being
 created.  Because no one else can see the relation, there's no point in
 sharing their pages.

No, temp buffers are *only* used for temp tables.  Several versions back
the code acted as you say, but we got rid of that because it was more of
a headache than it was worth.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Hmmm 8.1 pg_dumpall cannot dump older db's?

2005-07-07 Thread Tom Lane
Christopher Kings-Lynne [EMAIL PROTECTED] writes:
 Seems that it is expecting the new 'postgres' database to exist on old 
 installations?

Ooops :-(  Seems like maybe we want it to try postgres and then fall
back to trying template1?

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] Hmmm 8.1 pg_dumpall cannot dump older db's?

2005-07-07 Thread Christopher Kings-Lynne
Seems that it is expecting the new 'postgres' database to exist on old 
installations?


Ooops :-(  Seems like maybe we want it to try postgres and then fall
back to trying template1?


Actually, also ONLY assume postgres is a special database if the backend 
is 8.1 or higher.  We don't want to mess with poor people who have 
already created a database called 'postgres' in their installation of 
7.4, say...


Chris


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Hmmm 8.1 pg_dumpall cannot dump older db's?

2005-07-07 Thread Christopher Kings-Lynne
Seems that it is expecting the new 'postgres' database to exist on old 
installations?


Ooops :-(  Seems like maybe we want it to try postgres and then fall
back to trying template1?


No idea :)  I haven't followed the new postgres database changes 
particularly well...


Chris


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] Hmmm 8.1 pg_dumpall cannot dump older db's?

2005-07-07 Thread Tom Lane
Christopher Kings-Lynne [EMAIL PROTECTED] writes:
 Ooops :-(  Seems like maybe we want it to try postgres and then fall
 back to trying template1?

 Actually, also ONLY assume postgres is a special database if the backend 
 is 8.1 or higher.  We don't want to mess with poor people who have 
 already created a database called 'postgres' in their installation of 
 7.4, say...

No, because it's special anyway where the dump will be reloaded.  Keep
in mind that the design assumption for pg_dump(all) is always that the
destination database will be current release or higher; frequently the
dump file won't even be parseable by older servers.

The thing that makes this slightly painful is that we can't tell what
version we are dumping *from* until we've connected, and so we cannot
automagically do the right thing here.  I don't really see any other
way to do it than the try-and-fallback approach.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-07 Thread Tom Lane
Josh Berkus josh@agliodbs.com writes:
 If so, please undo the previous patch (which disabled page dumping
 entirely) and instead try removing this block of code, starting
 at about xlog.c line 620 in CVS tip:

 Will do.  Results in a few days.

Great.  BTW, don't bother testing snapshots between 2005/07/05 2300 EDT
and just now --- Bruce's full_page_writes patch introduced a large
random negative component into the timing ...

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] process crash when a plpython function returns

2005-07-07 Thread James William Pye
On Mon, 2005-06-27 at 08:12 -0600, Michael Fuhr wrote:
  also in this context it would be helpful
  if sys.defaultencoding would be set to
  the database encoding so strings get encoded
  to utf-8 when postgres works in unicode mode
  rather then the default encoding of ascii.
  This could avoid most of the PyObject_Str()
  exeptions in the first place.
 
 I haven't looked at doing that yet and probably won't before feature
 freeze.  Gerrit van Dyk has expressed an interest in hacking on
 PL/Python (he recently submitted a SETOF patch) so maybe he'll work
 on it.

I have this fixed, for the most part, in PL/Py. What I have done might
be a good starting place for someone who wants to get it fixed in core.

http://cvs.pgfoundry.org/cgi-bin/cvsweb.cgi/python/be/src/encoding.c

This file makes using PostgreSQL encodings in Python a more friendly
experience by setting up some aliases. (uóäæ.encode('UNICODE') would
work in 8.0)

Also, to set the default encoding used by Python's Unicode strings:
  PyUnicode_SetDefaultEncoding(PyEncoding_FromPgEncoding(GetDatabaseEncoding()))

PyEncoding_FromPgEncoding is defined in encoding.c.

Also, it should be noted that to get the interpreter to read the
function code as a specific encoding, one must use, afaik, the # -*-
encoding: utf-8 -*- magic.
-- 
Regards, James William Pye

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq