[HACKERS] Online base backup from the hot-standby

2011-05-26 Thread Jun Ishiduka
Hi

I would like to develop function for 'Online base backup from the
hot-standby' in PostgreSQL 9.2.

Todo : Allow hot file system backups on standby servers
(http://wiki.postgresql.org/wiki/Todo)


[GOAL]
 * Make pg_basebackup to execute to the hot-standby server 
   and acquire online-base-backup .
 - pg_basebackup can be executed to only primary server in 
   PostgreSQL 9.1 .
 - But physical-copy(etc) under processing of pg_basebackup
   raises the load of primary server .
 - Therefore , this function is necessary .

[Problem]
(There is the following problems when hot-standby acquires 
 online-base-backup like executing pg_basebackup to the primary 
 server .)
 * pg_start_backup() and pg_stop_backup() can't be executed to the 
   hot-standby server .
 - hot-standby can't insert backup-end record to WAL-files and 
   can't operate CHECKPOINT .
- Because hot-standby can't write anything in WAL-files .
 * hot-standby can't send WAL-files to archive server.
 - when pg_stop_backup() is executed to the primary server , 
   it waits for completing sending wal to archive server ,  
   but hot-standby can't do it.

[Policy]
(I create with the following Policy .)
 * This function doesn't affect primary server . 
- I don't adopt the way which "hot-standby requests primary to
  execute pg_basebackup" , because I think about many standbys 
  is connected with a primary .

[Approach]
 * When pg_basebackup is executed to the hot-standby server , it 
   executes RESTARTPOINT instead of CHECKPOINT . 
   backup_label is made from the RESTARTPOINT's results , and is sent 
   to the designated backup server using pg_basebackup connection .
 * Instead of inserting backup-end record , hot-standby writes 
   backup-end-position in backup-history-file and sends to the 
   designated backup server using pg_basebackup connection .
- In 9.1 , startup process knows backup-end-position from only 
  backup-end record . In addition to its logic, startup process 
  can know backup-end-position from backup-history-file . 
  As a result , startup process can recovery certainly 
  without backup-end record .

[Precondition]
(As a result of the above-mentioned Policy and Approach , there is 
 the following restrictions .)
 * Immediately after backup starting of WAL must contain 
   full page writes . But the above-mentioned Approach can't satisfy 
   the restriction according to circumstances . Because 
   full_page_writes of primary might equal 'off' .
   When standby recovery WAL which is removed full page writes by pg_lesslog
   , it is the same .
 * Because recovery starts from last CHECKPOINT , it becomes long .
 * I has not thought new process that become taking the place of 
   waiting for completing sending wal to archive server , yet.

[Working Step]
 STEP1: Make startup process to acquire backup-end-position from 
not only backup-end record but also backup-history-file .
  * startup process allows to acquire backup-end-position 
from backup-history-file .
  * When pg_basebackup is executed , backup-history-file is 
sent to the designated backup server .
 
 STEP2: Make pg_start_backup() and pg_stop_backup() to be executed 
by the hot-standby server.
 
[Action until The first CommitFest (on June 15)]
I will create a patch to STEP1 .
(The patch will be able to settle a problem of Omnipitr-backup-slave.)
(a problem of Omnipitr-backup-slave : 
http://archives.postgresql.org/pgsql-hackers/2011-03/msg01490.php)
  * Shedule of creating STEP2 is the next CommitFest (in September 15) 




Jun Ishizuka
NTT Software Corporation
TEL:045-317-7018
E-Mail: ishizuka@po.ntts.co.jp




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pre-alloc ListCell's optimization

2011-05-26 Thread Stephen Frost
Greg,

* Greg Stark (gsst...@mit.edu) wrote:
> On Thu, May 26, 2011 at 8:52 PM, Stephen Frost  wrote:
> >  list_concat() does explicitly say that cells will
> > be shared afterwards and that you can't pfree() either list (note that
> > there's actually a couple cases currently that I discovered which were
> > also addressed in the original patch where I commented out those
> > pfree()'s).
> 
> So in traditional list it would splice the second argument onto the
> end of the first list. This has a few effects that it sounds like you
> haven't preserved. For example if I insert an element anywhere in
> list2 -- including in the first few elements -- it's also inserted
> into list1.

Reading through the comments, it doesn't look like we expressly forbid
that, but it seems pretty unlikely that it's done.  In any case, it
wouldn't be difficult to fix, to be honest..  All we'd have to do is
modify list2's head pointer to point to the new location.  We do say
that list1 is destructively changed and that the returned pointer must
be used going forward.

> I'm not really sure we care about these semantics with our lists
> though. It's not like they're supposed to be a full-featured lisp
> emulator and it's not like the C code pulls any particularly clever
> tricks with lists. I suspect we may have already broken these
> semantics long ago but I haven't looked to see if that's the case.

It doesn't look like it was broken previously, but at the same time, it
doesn't look like those semantics are depended upon (or at least,
they're not tested through the regressions :).

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Pre-alloc ListCell's optimization

2011-05-26 Thread Stephen Frost
* Greg Stark (gsst...@mit.edu) wrote:
> On Thu, May 26, 2011 at 11:57 AM, Stephen Frost  wrote:
> > * Tom Lane (t...@sss.pgh.pa.us) wrote:
> > While I agree that there is some bloat that'll happen with this
> > approach, we could reduce it by just having a 4-entry cache instead of
> > an 8-entry cache.  I'm not really sure that saving those 64 bytes per
> > list is really worth it though.
> 
> First off this whole direction seems a bit weird to me. It sounds like
> you're just reimplementing palloc inside the List data structure with
> its allocator and everything. Why not just improve the memory
> allocator in palloc instead of layering a second one on top of it?

I do think it'd be great to improve palloc(), but having looked through
that code, figuring out how to improve it for the small case (such as
with the lists) while keeping it working well for larger and other cases
doesn't exactly look trivial.

> But assuming there's an advantage I've missed there's another
> possibility here: Are most of these small lists constructed with
> list_makeN? 

Looks like we've got 306 cases of list_make1(), 82 cases of list_makeN()
(where N > 1), but that said, one can make a list w/ just lappend(), and
that seems to happen with some regularity.

> But all this seems odd to me. The only reason for any of this is for
> api convenience so we can pass around lists instead of passing arrays.
> If the next links are really a big source of overhead we should just
> fix our apis to take arrays of the right length or arrays with a
> separate length argument.

I'm not really sure I agree with this..  Lists are pretty useful and
easier to manage when you don't know the size.  I expect quite a few of
these lists are small for simple queries and can get pretty large for
complex queries.  Also, in many cases it's natural to step through the
list and not need random access into it, which at least reduces the
reasons to go to the effort of having a variable length array.

> Or if it's just palloc we should fix our memory allocator to behave
> the way the callers need it to. Heikki long ago suggested adding a
> stack allocator for the parser to use for its memory context to reduce
> overhead of small allocations which won't be freed until the context
> is freed for example.

Much of this originated from Greg's oprofile and Tom's further
commentary on it here:

http://archives.postgresql.org/pgsql-hackers/2011-04/msg00714.php

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Pre-alloc ListCell's optimization

2011-05-26 Thread Greg Stark
On Thu, May 26, 2011 at 8:52 PM, Stephen Frost  wrote:
>  list_concat() does explicitly say that cells will
> be shared afterwards and that you can't pfree() either list (note that
> there's actually a couple cases currently that I discovered which were
> also addressed in the original patch where I commented out those
> pfree()'s).

So in traditional list it would splice the second argument onto the
end of the first list. This has a few effects that it sounds like you
haven't preserved. For example if I insert an element anywhere in
list2 -- including in the first few elements -- it's also inserted
into list1.

I'm not really sure we care about these semantics with our lists
though. It's not like they're supposed to be a full-featured lisp
emulator and it's not like the C code pulls any particularly clever
tricks with lists. I suspect we may have already broken these
semantics long ago but I haven't looked to see if that's the case.




-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Expression Evaluator used for creating the plan tree / stmt ?

2011-05-26 Thread Vaibhav Kaushal
Thanks Tom. Comparing to you people, I am definitely new to almost
everything here. I did debug a few smaller programs and never seen anything
as such. So asked. Moreover, those programs I compiled never used any
optimization.

While everything seems to be working, it looks like the slot values do not
change and all rows in a sequential scan return the first value it finds on
the disk, n number of times, where n = number of rows in the table! I am
going to compile without optimization now. Hopefully that would change a few
things in the debugging process.

Seems beautiful, complicated, mysterious. And I thought I was beginning to
understand computers. :)

Whatever be the case, I will look more into it and ask again if I get into
too much of trouble.

Regards,
Vaibhav

On Fri, May 27, 2011 at 9:18 AM, Tom Lane  wrote:

> Vaibhav Kaushal  writes:
> > Why do these lines:
> > ...
> > repeat twice?
>
> Hm, you're new to using gdb, no?  That's pretty normal: gdb is just
> reflecting back the fact that the compiler rearranges individual
> instructions as it sees fit.  You could eliminate most, though perhaps
> not all, of that noise if you built the program-under-test (ie postgres)
> at -O0.
>
>regards, tom lane
>


Re: [HACKERS] Pre-alloc ListCell's optimization

2011-05-26 Thread Stephen Frost
* Greg Stark (gsst...@mit.edu) wrote:
> On Thu, May 26, 2011 at 11:57 AM, Stephen Frost  wrote:
> > Handling the 1-entry case would likely be pretty
> > straight-forward, but you need book-keeping as soon as you go to two,
> > and all that book-keeping feels like overkill for just a 2-entry cache
> > to me.
> 
> Incidentally what if I call nconc and pass a second arg of a list that
> has the first few elements stashed in an array. Do you copy those
> elements into cells before doing the nconc? Does our nconc support
> having lists share cells? I suspect it doesn't actually so perhaps
> that's good enough.

nconc() turns into list_concat() which turns into adding list2 on to the
end of list1 using the other normal lappend() routines which will
utilize space in the cache of list1 if there is space available.  Trying
to use the old list2 for storage or much of anything turned into a real
pain, unfortunately.  list_concat() does explicitly say that cells will
be shared afterwards and that you can't pfree() either list (note that
there's actually a couple cases currently that I discovered which were
also addressed in the original patch where I commented out those
pfree()'s).

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Expression Evaluator used for creating the plan tree / stmt ?

2011-05-26 Thread Tom Lane
Vaibhav Kaushal  writes:
> Why do these lines:
> ...
> repeat twice?

Hm, you're new to using gdb, no?  That's pretty normal: gdb is just
reflecting back the fact that the compiler rearranges individual
instructions as it sees fit.  You could eliminate most, though perhaps
not all, of that noise if you built the program-under-test (ie postgres)
at -O0.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pre-alloc ListCell's optimization

2011-05-26 Thread Greg Stark
On Thu, May 26, 2011 at 11:57 AM, Stephen Frost  wrote:
> Handling the 1-entry case would likely be pretty
> straight-forward, but you need book-keeping as soon as you go to two,
> and all that book-keeping feels like overkill for just a 2-entry cache
> to me.

Incidentally what if I call nconc and pass a second arg of a list that
has the first few elements stashed in an array. Do you copy those
elements into cells before doing the nconc? Does our nconc support
having lists share cells? I suspect it doesn't actually so perhaps
that's good enough.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Expression Evaluator used for creating the plan tree / stmt ?

2011-05-26 Thread Vaibhav Kaushal
OK, I ran a GDB trace into ExecScan and here is a part of it:

#

(gdb) finish
Run till exit from #0  ExecScanFetch (node=0x1d5c3c0,
accessMtd=0x55dd10 , recheckMtd=0x55db70 )
at execScan.c:44
194 if (TupIsNull(slot))
(gdb) s
205 econtext->ecxt_scantuple = slot;
(gdb) s
206 int num_atts = slot->tts_tupleDescriptor->natts;
(gdb) s
207 elog(INFO, "[start] BEFORE ExecQual===");
(gdb) s
206 int num_atts = slot->tts_tupleDescriptor->natts;
(gdb) s
207 elog(INFO, "[start] BEFORE ExecQual===");
(gdb) s
elog_start (filename=0x7c9db2 "execScan.c", lineno=207,
funcname=0x7c9e69 "ExecScan") at elog.c:1089
1089 {
(gdb)

##

Why do these lines:



206 int num_atts = slot->tts_tupleDescriptor->natts;
(gdb) s
207 elog(INFO, "[start] BEFORE ExecQual===");



repeat twice? I have written them only once! GDB documentation does not
help! A few forums I am on, people accuse me of anything between bad
programming to recursion. Any idea? I never face this with rest of the code
(and in no other program). I am on Fedora 13 X86_64.

Regards,
Vaibhav


On Wed, May 25, 2011 at 11:45 PM, Vaibhav Kaushal <
vaibhavkaushal...@gmail.com> wrote:

> I think the command 'where' does the same. And the command showed something
> which looked like was part of evaluation...it got me confused. Anyways,
> thanks robert. I will check that too. I did not know the 'bt' command.
>
> --
> Sent from my Android
> On 25 May 2011 23:02, "Robert Haas"  wrote:
>


Re: [HACKERS] Pre-alloc ListCell's optimization

2011-05-26 Thread Greg Stark
On Thu, May 26, 2011 at 11:57 AM, Stephen Frost  wrote:
> * Tom Lane (t...@sss.pgh.pa.us) wrote:
>> I'm worried that this type of approach would
>> bloat the storage required in those cases to a degree that would make
>> the patch unattractive.
>
> While I agree that there is some bloat that'll happen with this
> approach, we could reduce it by just having a 4-entry cache instead of
> an 8-entry cache.  I'm not really sure that saving those 64 bytes per
> list is really worth it though.

First off this whole direction seems a bit weird to me. It sounds like
you're just reimplementing palloc inside the List data structure with
its allocator and everything. Why not just improve the memory
allocator in palloc instead of layering a second one on top of it?

But assuming there's an advantage I've missed there's another
possibility here: Are most of these small lists constructed with
list_makeN? In which case maybe the trick would be to special case the
initial contents by hard coding a variable sized array which
represents the first N elements and is only constructed when the list
is first constructed with its initial values. So a list make with
list_make3() would have a 3 element array and then any further
elements added would be in the added cons cells. If any of those were
removed we would decrement the count but leave the array in place.

This would reduce the overhead of any small static lists that aren't
modified much which is probably the real case we're talking about.
Things like operator arguments or things constructed in the parse
tree.

The cost would be the risk of bugs that only occur when something is
passed a 2-element list that was made with list_make2() but not one
made by list_make1() + list_append() or vice versa.

This has the side benefit of allowing an arbitrarily large initial
array (well, as large as the length field for the array size allows)
if we wanted to have something like list_copy_static() which made a
list that was expected not to be modified a lot subsequently and might
as well be stored in a single large array.

But all this seems odd to me. The only reason for any of this is for
api convenience so we can pass around lists instead of passing arrays.
If the next links are really a big source of overhead we should just
fix our apis to take arrays of the right length or arrays with a
separate length argument.

Or if it's just palloc we should fix our memory allocator to behave
the way the callers need it to. Heikki long ago suggested adding a
stack allocator for the parser to use for its memory context to reduce
overhead of small allocations which won't be freed until the context
is freed for example.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch for new feature: Buffer Cache Hibernation

2011-05-26 Thread Greg Smith

On 05/07/2011 03:32 AM, Mitsuru IWASAKI wrote:

For 1, I've just finish my work.  The latest patch is available at:
http://people.freebsd.org/~iwasaki/postgres/buffer-cache-hibernation-postgresql-20110507.patch
   


Reminder here--we can't accept code based on it being published to a web 
page.  You'll need to e-mail it to the pgsql-hackers mailing list to be 
considered for the next PostgreSQL CommitFest, which is starting in a 
few weeks.  Code submitted to the mailing list is considered a release 
of it to the project under the PostgreSQL license, which we can't just 
assume for things when given only a URL to them.


Also, you suggested you were out of time to work on this.  If that's the 
case, we'd like to know that so we don't keep cc'ing you about things in 
expectation of an answer.  Someone else may pick this up as a project to 
continue working on.  But it's going to need a fair amount of revision 
before it matches what people want here, and I'm not sure how much of 
what you've written is going to end up in any commit that may happen 
from this idea.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LOCK DATABASE

2011-05-26 Thread Michael Paquier
Hi all,

On Fri, May 27, 2011 at 2:13 AM, Robert Haas  wrote:

> On Thu, May 26, 2011 at 12:28 PM, Ross J. Reedstrom 
> wrote:
> > Perhaps the approach to restricting connections should not be a database
> > object lock, but rather an admin function that does the equivalent of
> > flipping datallowconn in pg_database?
>
> To me, that seems like a better approach, although it's a little hard
> to see how we'd address Alvaro's desire to have it roll back
> automatically when the session disconnected.  The disconnect might be
> caused by a FATAL error, for example.
>
> I'm actually all in favor of doing more things via SQL rather than
> configuration files.  The idea of some ALTER SYSTEM command seems very
> compelling to me.  I just don't really like this particular
> implementation, which to me seems far too bound up in implementation
> details I'd rather not rely on.
>
Me too it it looks I'm a little bit late on this topic...
Even if I got some interest in it.
Personally I'd think such a lock system playing with file system is perhaps
not the best way of doing as argued until now. It would make the DBA able to
do superuser-like actions by modifying system files like pg_hba.conf.
SQL approach looks to be better.
At this point, perhaps you may be interested in such an approach:
http://wiki.postgresql.org/wiki/Lock_database
I wrote that after the cluster summit.

Regards,
-- 
Michael Paquier
http://michael.otacoo.com


Re: [HACKERS] [ADMIN] pg_class reltuples/relpages not updated by autovacuum/vacuum

2011-05-26 Thread Tom Lane
"Kevin Grittner"  writes:
> When we prune or vacuum a page, I don't suppose we have enough
> information about that page's previous state to calculate a tuple
> count delta, do we?  That would allow a far more accurate number to
> be maintained than anything suggested so far, as long as we tweak
> autovacuum to count inserts toward the need to vacuum.

Well, that was the other direction that was suggested upthread: stop
relying on reltuples at all, but use the stats collector's counts.
That might be a good solution in the long run, but there are some
issues:

1. It's not clear how using a current count, as opposed to
time-of-last-vacuum count, would affect the behavior of the autovacuum
control logic.  At first glance I think it would break it, since the
basic logic there is "how much of the table changed since it was last
vacuumed?".  Even if the equations could be modified to still work,
I remember enough feedback control theory from undergrad EE to think that
this is something to be seriously scared of tweaking without extensive
testing.  IMO it is far more risky than what Robert is worried about.

2. You still have the problem that we're exposing inaccurate (or at
least less accurate than they could be) counts to the planner and to
onlooker clients.  We could change the planner to also depend on the
stats collector instead of reltuples, but at that point you just removed
the option for people to turn off the stats collector.  The implications
for plan stability might be unpleasant, too.

So that's not a direction I want to go without a significant amount
of work and testing.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] "errno" not set in case of "libm" functions (HPUX)

2011-05-26 Thread Tom Lane
Peter Eisentraut  writes:
> On tor, 2011-05-26 at 12:14 -0400, Tom Lane wrote:
>> I tried this on my HP-UX 10.20 box, and it didn't work very nicely:
>> configure decided that the compiler accepted +Olibmerrno, so I got a
>> compile full of
>>  cc: warning 450: Unrecognized option +Olibmerrno.
>> warnings.  The reason is that PGAC_PROG_CC_CFLAGS_OPT does not pay any
>> attention to whether the proposed flag generates a warning.  That seems
>> like a bug --- is there any situation where we'd want to accept a flag
>> that does generate a warning?  I'm thinking that macro should set
>> ac_c_werror_flag=yes, the same way PGAC_C_INLINE does.

> I think so.

OK, committed with that addition.

> We could also do that globally, but that would probably be something for
> the next release.

Hmm.  I'm a bit scared of how much might break.  I don't think the
autoconf tests are generally designed to guarantee no warnings.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [ADMIN] pg_class reltuples/relpages not updated by autovacuum/vacuum

2011-05-26 Thread Kevin Grittner
Robert Haas  wrote:
> Kevin Grittner  wrote:
 
>> By storing the ratio and one count you make changes to the
>> other count implied and less visible.  It seems more
>> understandable and less prone to error (to me, anyway) to keep
>> the two "raw" numbers and calculate the ratio -- and when you
>> observe a change in one raw number which you believe should force
>> an adjustment to the other raw number before its next actual
>> value is observed, to comment on why that's a good idea, and do
>> the trivial arithmetic at that time.
> 
> Except that's not how it works.  At least in the case of ANALYZE,
> we *aren't* counting all the tuples in the table.  We're selecting
> a random sample of pages and inferring a tuple density, which we
> then extrapolate to the whole table and store.  Then when we pull
> it back out of the table, we convert it back to a tuple density. 
> The real value we are computing and using almost everywhere is
> tuple density; storing a total number of tuples in the table
> appears to be just confusing the issue.
 
Well, if tuple density is the number which is most heavily used, it
might shave a few nanoseconds doing the arithmetic in enough places
to justify the change, but I'm skeptical.  Basically I'm with Tom on
the fact that this change would store neither more nor less
information (and for that matter would not really change what
information you can easily retrieve); and slightly changing the
manner in which it is stored doesn't solve any of the problems you
assert that it does.
 
When we prune or vacuum a page, I don't suppose we have enough
information about that page's previous state to calculate a tuple
count delta, do we?  That would allow a far more accurate number to
be maintained than anything suggested so far, as long as we tweak
autovacuum to count inserts toward the need to vacuum.  (It seems to
me I saw a post giving some reason that would have benefits anyway.)
Except for the full pass during transaction wrap-around protection,
where it could just set a new actual count, autovacuum would be
skipping pages where the bit is set to indicate that all tuples are
visible, right?
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] "errno" not set in case of "libm" functions (HPUX)

2011-05-26 Thread Peter Eisentraut
On tor, 2011-05-26 at 12:14 -0400, Tom Lane wrote:
> Ibrar Ahmed  writes:
> > Please find the updated patch. I have added this "+Olibmerrno" compile flag
> > check in configure/configure.in file.
> 
> I tried this on my HP-UX 10.20 box, and it didn't work very nicely:
> configure decided that the compiler accepted +Olibmerrno, so I got a
> compile full of
>   cc: warning 450: Unrecognized option +Olibmerrno.
> warnings.  The reason is that PGAC_PROG_CC_CFLAGS_OPT does not pay any
> attention to whether the proposed flag generates a warning.  That seems
> like a bug --- is there any situation where we'd want to accept a flag
> that does generate a warning?  I'm thinking that macro should set
> ac_c_werror_flag=yes, the same way PGAC_C_INLINE does.

I think so.

We could also do that globally, but that would probably be something for
the next release.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] inconvenient compression options in pg_basebackup

2011-05-26 Thread Tom Lane
Peter Eisentraut  writes:
> On tor, 2011-05-26 at 16:54 -0400, Tom Lane wrote:
>> But if you want to take such an extension into account right now,
>> maybe we ought to design that feature now.  What are you seeing it as
>> looking like?
>> 
>> My thought is that "-z" should just mean "give me compression; a good
>> default compression setting is fine".  "-Zn" could mean "I want gzip
>> with exactly this compression level" (thus making the presence or
>> absence of -z moot).  If you want to specify some other compression
>> method altogether, use something like --lzma=N.  It seems unlikely to
>> me that somebody who wants to override the default compression method
>> wouldn't want to pick the settings for it too. 

> I think of pg_basebackup as analogous to tar.  tar has a bunch of
> options to set a compression method (-Z, -z, -j, -J), but no support for
> setting compression specific options.  So in that sense that contradicts
> your suspicion.

I would think we'd be more concerned about preserving an analogy to
pg_dump, which most certainly does expose compression-quality options.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup compressed tar to stdout

2011-05-26 Thread Peter Eisentraut
On tor, 2011-05-26 at 17:06 -0400, Tom Lane wrote:
> Peter Eisentraut  writes:
> > pg_basebackup currently doesn't allow compressed tar to stdout.  That
> > should be added to make the interface consistent, and specifically to
> > allow common idoms like
> 
> > pg_basebackup -Ft -z -D - | ssh tar -x -z -f -
> 
> > Small patch attached.
> 
> I have not bothered to read this in context, but the visible part of the
> patch makes it look like you broke the not-HAVE_LIBZ case ... other than
> that gripe, no objection.

Ah yes, that needs some fine-tuning.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup compressed tar to stdout

2011-05-26 Thread Tom Lane
Peter Eisentraut  writes:
> pg_basebackup currently doesn't allow compressed tar to stdout.  That
> should be added to make the interface consistent, and specifically to
> allow common idoms like

> pg_basebackup -Ft -z -D - | ssh tar -x -z -f -

> Small patch attached.

I have not bothered to read this in context, but the visible part of the
patch makes it look like you broke the not-HAVE_LIBZ case ... other than
that gripe, no objection.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] inconvenient compression options in pg_basebackup

2011-05-26 Thread Peter Eisentraut
On tor, 2011-05-26 at 16:54 -0400, Tom Lane wrote:
> But if you want to take such an extension into account right now,
> maybe we ought to design that feature now.  What are you seeing it as
> looking like?
> 
> My thought is that "-z" should just mean "give me compression; a good
> default compression setting is fine".  "-Zn" could mean "I want gzip
> with exactly this compression level" (thus making the presence or
> absence of -z moot).  If you want to specify some other compression
> method altogether, use something like --lzma=N.  It seems unlikely to
> me that somebody who wants to override the default compression method
> wouldn't want to pick the settings for it too. 

I think of pg_basebackup as analogous to tar.  tar has a bunch of
options to set a compression method (-Z, -z, -j, -J), but no support for
setting compression specific options.  So in that sense that contradicts
your suspicion.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] pg_basebackup compressed tar to stdout

2011-05-26 Thread Peter Eisentraut
pg_basebackup currently doesn't allow compressed tar to stdout.  That
should be added to make the interface consistent, and specifically to
allow common idoms like

pg_basebackup -Ft -z -D - | ssh tar -x -z -f -

Small patch attached.
diff --git i/doc/src/sgml/ref/pg_basebackup.sgml w/doc/src/sgml/ref/pg_basebackup.sgml
index 8a7b833..32fa9f8 100644
--- i/doc/src/sgml/ref/pg_basebackup.sgml
+++ w/doc/src/sgml/ref/pg_basebackup.sgml
@@ -174,8 +174,7 @@ PostgreSQL documentation
   

 Enables gzip compression of tar file output. Compression is only
-available when generating tar files, and is not available when sending
-output to standard output.
+available when using the tar format.

   
  
diff --git i/src/bin/pg_basebackup/pg_basebackup.c w/src/bin/pg_basebackup/pg_basebackup.c
index 1f31fe0..713c3af 100644
--- i/src/bin/pg_basebackup/pg_basebackup.c
+++ w/src/bin/pg_basebackup/pg_basebackup.c
@@ -261,7 +261,20 @@ ReceiveTarFile(PGconn *conn, PGresult *res, int rownum)
 		 * Base tablespaces
 		 */
 		if (strcmp(basedir, "-") == 0)
-			tarfile = stdout;
+		{
+			if (compresslevel > 0)
+			{
+ztarfile = gzdopen(dup(fileno(stdout)), "wb");
+if (gzsetparams(ztarfile, compresslevel, Z_DEFAULT_STRATEGY) != Z_OK)
+{
+	fprintf(stderr, _("%s: could not set compression level %i: %s\n"),
+			progname, compresslevel, get_gz_error(ztarfile));
+	disconnect_and_exit(1);
+}
+			}
+			else
+tarfile = stdout;
+		}
 		else
 		{
 #ifdef HAVE_LIBZ
@@ -384,7 +397,12 @@ ReceiveTarFile(PGconn *conn, PGresult *res, int rownum)
 }
 			}
 
-			if (strcmp(basedir, "-") != 0)
+			if (strcmp(basedir, "-") == 0)
+			{
+if (ztarfile)
+	gzclose(ztarfile);
+			}
+			else
 			{
 #ifdef HAVE_LIBZ
 if (ztarfile != NULL)
@@ -1076,14 +1094,6 @@ main(int argc, char **argv)
 progname);
 		exit(1);
 	}
-#else
-	if (compresslevel > 0 && strcmp(basedir, "-") == 0)
-	{
-		fprintf(stderr,
-_("%s: compression is not supported on standard output\n"),
-progname);
-		exit(1);
-	}
 #endif
 
 	/*

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] inconvenient compression options in pg_basebackup

2011-05-26 Thread Tom Lane
Peter Eisentraut  writes:
> On tis, 2011-05-24 at 15:34 -0400, Tom Lane wrote:
>> I would argue that -Z ought to turn on "gzip" without my having to write
>> -z as well (at least when the argument is greater than zero; possibly
>> -Z0 should be allowed as meaning "no compression"). 

> My concern with that is that if we ever add another compression method,
> would we then add another option to control the compression level of
> that method?

Um ... what's your point?  Forcing the user to type two switches instead
of one isn't going to make that hypothetical future extension any
easier, AFAICS.

But if you want to take such an extension into account right now, maybe
we ought to design that feature now.  What are you seeing it as looking
like?

My thought is that "-z" should just mean "give me compression; a good
default compression setting is fine".  "-Zn" could mean "I want gzip
with exactly this compression level" (thus making the presence or
absence of -z moot).  If you want to specify some other compression
method altogether, use something like --lzma=N.  It seems unlikely to me
that somebody who wants to override the default compression method
wouldn't want to pick the settings for it too.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [ADMIN] pg_class reltuples/relpages not updated by autovacuum/vacuum

2011-05-26 Thread Tom Lane
Robert Haas  writes:
> On Thu, May 26, 2011 at 12:23 PM, Tom Lane  wrote:
>>> Another thought: Couldn't relation_needs_vacanalyze() just scale up
>>> reltuples by the ratio of the current number of pages in the relation
>>> to relpages, just as the query planner does?

>> Hmm ... that would fix Florian's immediate issue, and it does seem like
>> a good change on its own merits.  But it does nothing for the problem
>> that we're failing to put the best available information into pg_class.
>> 
>> Possibly we could compromise on doing just that much in the back
>> branches, and the larger change for 9.1?

> Do you think we need to worry about the extra overhead of determining
> the current size of every relation as we sweep through pg_class?  It's
> not a lot, but OTOH I think we'd be doing it once a minute... not sure
> what would happen if there were tons of tables.

Ugh ... that is a mighty good point, since the RelationGetNumberOfBlocks
call would have to happen for each table, even the ones we then decide
not to vacuum.  We've already seen people complain about the cost of the
AV launcher once they have a lot of databases, and this would probably
increase it quite a bit.

> Going back to your thought upthread, I think we should really consider
> replacing reltuples with reltupledensity at some point.  I continue to
> be afraid that using a decaying average in this case is going to end
> up overweighting the values from some portion of the table that's
> getting scanned repeatedly, at the expense of other portions of the
> table that are not getting scanned at all.

Changing the representation of the information would change that issue
not in the slightest.  The fundamental point here is that we have new,
possibly partial, information which we ought to somehow merge with the
old, also possibly partial, information.  Storing the data a little bit
differently doesn't magically eliminate that issue.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [ADMIN] pg_class reltuples/relpages not updated by autovacuum/vacuum

2011-05-26 Thread Tom Lane
Robert Haas  writes:
> Except that's not how it works.  At least in the case of ANALYZE, we
> *aren't* counting all the tuples in the table.  We're selecting a
> random sample of pages and inferring a tuple density, which we then
> extrapolate to the whole table and store.  Then when we pull it back
> out of the table, we convert it back to a tuple density.  The real
> value we are computing and using almost everywhere is tuple density;
> storing a total number of tuples in the table appears to be just
> confusing the issue.

If we were starting in a green field we might choose to store tuple
density.  However, the argument for changing it now is at best mighty
thin; IMO it is not worth the risk of breaking client code.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [ADMIN] pg_class reltuples/relpages not updated by autovacuum/vacuum

2011-05-26 Thread Robert Haas
On Thu, May 26, 2011 at 2:05 PM, Kevin Grittner
 wrote:
>> I'm a bit confused by this - what the current design obfuscates is
>> the fact that reltuples and relpages are not really independent
>> columns; you can't update one without updating the other, unless
>> you want screwy behavior.  Replacing reltuples by reltupledensity
>> would fix that problem - it would be logical and non-damaging to
>> update either column independently.
>
> They don't always move in tandem.  Certainly there can be available
> space in those pages from which tuples can be allocated or which
> increases as tuples are vacuumed.  Your proposed change would
> neither make more or less information available, because we've got
> two numbers which can be observed as raw counts, and a ratio between
> them.

So far I agree.

> By storing the ratio and one count you make changes to the
> other count implied and less visible.  It seems more understandable
> and less prone to error (to me, anyway) to keep the two "raw"
> numbers and calculate the ratio -- and when you observe a change in
> one raw number which you believe should force an adjustment to the
> other raw number before its next actual value is observed, to
> comment on why that's a good idea, and do the trivial arithmetic at
> that time.

Except that's not how it works.  At least in the case of ANALYZE, we
*aren't* counting all the tuples in the table.  We're selecting a
random sample of pages and inferring a tuple density, which we then
extrapolate to the whole table and store.  Then when we pull it back
out of the table, we convert it back to a tuple density.  The real
value we are computing and using almost everywhere is tuple density;
storing a total number of tuples in the table appears to be just
confusing the issue.

Unless, of course, I am misunderstanding, which is possible.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] inconvenient compression options in pg_basebackup

2011-05-26 Thread Peter Eisentraut
On tis, 2011-05-24 at 15:34 -0400, Tom Lane wrote:
> I would argue that -Z ought to turn on "gzip" without my having to
> write
> -z as well (at least when the argument is greater than zero; possibly
> -Z0 should be allowed as meaning "no compression"). 

My concern with that is that if we ever add another compression method,
would we then add another option to control the compression level of
that method?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SSI predicate locking on heap -- tuple or row?

2011-05-26 Thread Kevin Grittner
Heikki Linnakangas  wrote:
 
> Could you explain in the README, why it is safe to only take the
> lock on the visible row version, please?
 
Sure.  I actually intended to do this last night but ran out of
steam and posted what I had, planning on following up with that.
 
The place it seemed to fit best was in the "Innovations" section,
since the SSI papers and their prototype implementations seemed
oriented toward "rows" -- certainly the SIREAD locks were at the row
level, versus a row version level.
 
Since this doesn't touch any of the files in yesterday's patch, and
it seems entirely within the realm of possibility that people will
want to argue about how best to document this more than the actual
fix, I'm posting it as a separate patch -- README-SSI only.
 
I mostly just copied from Dan's posted proof verbatim.
 
-Kevin
*** a/src/backend/storage/lmgr/README-SSI
--- b/src/backend/storage/lmgr/README-SSI
***
*** 402,407  is based on the top level xid.  When looking at an xid that 
comes
--- 402,455 
  from a tuple's xmin or xmax, for example, we always call
  SubTransGetTopmostTransaction() before doing much else with it.
  
+ * PostgreSQL does not use "update in place" with a rollback log
+ for its MVCC implementation.  Where possible it uses "HOT" updates on
+ the same page (if there is room and no indexed value is changed).
+ For non-HOT updates the old tuple is expired in place and a new tuple
+ is inserted at a new location.  Because of this difference, a tuple
+ lock in PostgreSQL doesn't automatically lock any other versions of a
+ row.  We don't try to copy or expand a tuple lock to any other
+ versions of the row, based on the following proof that any additional
+ serialization failures we would get from that would be false
+ positives:
+ 
+   o If transaction T1 reads a row (thus acquiring a predicate
+ lock on it) and a second transaction T2 updates that row, must a
+ third transaction T3 which updates the new version of the row have a
+ rw-conflict in from T1 to prevent anomalies?  In other words, does it
+ matter whether this edge T1 -> T3 is there?
+ 
+   o If T1 has a conflict in, it certainly doesn't. Adding the
+ edge T1 -> T3 would create a dangerous structure, but we already had
+ one from the edge T1 -> T2, so we would have aborted something
+ anyway.
+ 
+   o Now let's consider the case where T1 doesn't have a
+ conflict in. If that's the case, for this edge T1 -> T3 to make a
+ difference, T3 must have a rw-conflict out that induces a cycle in
+ the dependency graph, i.e. a conflict out to some transaction
+ preceding T1 in the serial order. (A conflict out to T1 would work
+ too, but that would mean T1 has a conflict in and we would have
+ rolled back.)
+ 
+   o So now we're trying to figure out if there can be an
+ rw-conflict edge T3 -> T0, where T0 is some transaction that precedes
+ T1. For T0 to precede T1, there has to be has to be some edge, or
+ sequence of edges, from T0 to T1. At least the last edge has to be a
+ wr-dependency or ww-dependency rather than a rw-conflict, because T1
+ doesn't have a rw-conflict in. And that gives us enough information
+ about the order of transactions to see that T3 can't have a
+ rw-dependency to T0:
+  - T0 committed before T1 started (the wr/ww-dependency implies this)
+  - T1 started before T2 committed (the T1->T2 rw-conflict implies this)
+  - T2 committed before T3 started (otherwise, T3 would be aborted
+because of an update conflict)
+ 
+   o That means T0 committed before T3 started, and therefore
+ there can't be a rw-conflict from T3 to T0.
+ 
+   o In both cases, we didn't need the T1 -> T3 edge.
+ 
  * Predicate locking in PostgreSQL will start at the tuple level
  when possible, with automatic conversion of multiple fine-grained
  locks to coarser granularity as need to avoid resource exhaustion.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pre-alloc ListCell's optimization

2011-05-26 Thread Stephen Frost
* Tom Lane (t...@sss.pgh.pa.us) wrote:
> I'm worried that this type of approach would
> bloat the storage required in those cases to a degree that would make
> the patch unattractive.  

While I agree that there is some bloat that'll happen with this
approach, we could reduce it by just having a 4-entry cache instead of
an 8-entry cache.  I'm not really sure that saving those 64 bytes per
list is really worth it though.  The cost of allocating the memory
doesn't seem like it changes a lot between those and I don't think it's
terribly common for us to copy lists around (copyList doesn't memcpy()
them).

> ISTM the first thing we'd need to have before
> we could think about this rationally is some measurements about the
> frequencies of different List lengths in a typical workload.

I agree, that'd be a good thing to have.  I'll look into measuring that.

> When Neil redid the List infrastructure a few years ago, there was some
> discussion of special-casing the very first ListCell, and allocating
> just that cell along with the List header.  

Well, we do allocate the first cell when we create a list in new_list(),
but it's a seperate palloc() call.  One of the annoying things that I
ran into with this patch is trying to keep track of if something could
be free'd with pfree() or not.  Can't allow pfree() of something inside
the array, etc.  Handling the 1-entry case would likely be pretty
straight-forward, but you need book-keeping as soon as you go to two,
and all that book-keeping feels like overkill for just a 2-entry cache
to me.

I'll try to collect some info on list lengths and whatnot though and get
a feel for just how much this is likely to help.  Of course, if someone
else has time to help with that, I wouldn't complain. :)

Thanks,

Stephen


signature.asc
Description: Digital signature


[HACKERS] #PgWest 2011: CFP now open

2011-05-26 Thread Joshua D. Drake

Hello,

The CFP for #PgWest is now open. We are holding it at the San Jose 
Convention Center from September 27th - 30th. We look forward to seeing 
your submissions.


http://www.postgresqlconference.org/

Joshua D. Drake
--
Command Prompt, Inc. - http://www.commandprompt.com/
PostgreSQL Support, Training, Professional Services and Development
The PostgreSQL Conference - http://www.postgresqlconference.org/
@cmdpromptinc - @postgresconf - 509-416-6579

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [ADMIN] pg_class reltuples/relpages not updated by autovacuum/vacuum

2011-05-26 Thread Kevin Grittner
Robert Haas  wrote:
> Kevin Grittner  wrote:
 
>> Given how trivial it would be to adjust reltuples to keep its
>> ratio to relpages about the same when we don't have a new "hard"
>> number, but some evidence that we should fudge our previous
>> value, I don't see where this change buys us much.  It seems to
>> mostly obfuscate the fact that we're changing our assumption
>> about how many tuples we have.  I would rather that we did that
>> explicitly with code comments about why it's justified than to
>> slip it in the way you suggest.
> 
> I'm a bit confused by this - what the current design obfuscates is
> the fact that reltuples and relpages are not really independent
> columns; you can't update one without updating the other, unless
> you want screwy behavior.  Replacing reltuples by reltupledensity
> would fix that problem - it would be logical and non-damaging to
> update either column independently.
 
They don't always move in tandem.  Certainly there can be available
space in those pages from which tuples can be allocated or which
increases as tuples are vacuumed.  Your proposed change would
neither make more or less information available, because we've got
two numbers which can be observed as raw counts, and a ratio between
them.  By storing the ratio and one count you make changes to the
other count implied and less visible.  It seems more understandable
and less prone to error (to me, anyway) to keep the two "raw"
numbers and calculate the ratio -- and when you observe a change in
one raw number which you believe should force an adjustment to the
other raw number before its next actual value is observed, to
comment on why that's a good idea, and do the trivial arithmetic at
that time.
 
As a thought exercise, what happens each way if a table is loaded
with a low fillfactor and then a lot of inserts are done?  What
happens if mass deletes are done from a table which has a high
density?
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pre-alloc ListCell's optimization

2011-05-26 Thread Robert Haas
On Tue, May 24, 2011 at 10:56 PM, Stephen Frost  wrote:
>  Someone (*cough*Haas*cough) made a claim over beers at PGCon that it
>  would be very difficult to come up with a way to pre-allocate List
>  entries and maintain the current List API.  I'll admit that it wasn't
>  quite as trivial as I had *hoped*, but attached is a proof-of-concept
>  patch which does it.
>
> [ various points ]

So I guess the first question here is - does it improve performance?

Because if it does, then it's worth pursuing ... if not, that's the
first thing to fix.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [ADMIN] pg_class reltuples/relpages not updated by autovacuum/vacuum

2011-05-26 Thread Robert Haas
On Thu, May 26, 2011 at 1:28 PM, Kevin Grittner
 wrote:
> Robert Haas  wrote:
>> I think we should really consider replacing reltuples with
>> reltupledensity at some point.  I continue to be afraid that using
>> a decaying average in this case is going to end up overweighting
>> the values from some portion of the table that's getting scanned
>> repeatedly, at the expense of other portions of the table that are
>> not getting scanned at all.  Now, perhaps experience will prove
>> that's not a problem.  But storing relpages and reltupledensity
>> separately would give us more flexibility, because we could feel
>> free to bump relpages even when we're not sure what to do about
>> reltupledensity.  That would help Florian's problem quite a lot,
>> even if we did nothing else.
>
> Given how trivial it would be to adjust reltuples to keep its ratio
> to relpages about the same when we don't have a new "hard" number,
> but some evidence that we should fudge our previous value, I don't
> see where this change buys us much.  It seems to mostly obfuscate
> the fact that we're changing our assumption about how many tuples we
> have.  I would rather that we did that explicitly with code comments
> about why it's justified than to slip it in the way you suggest.

I'm a bit confused by this - what the current design obfuscates is the
fact that reltuples and relpages are not really independent columns;
you can't update one without updating the other, unless you want
screwy behavior.  Replacing reltuples by reltupledensity would fix
that problem - it would be logical and non-damaging to update either
column independently.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] timezone GUC

2011-05-26 Thread Alvaro Herrera
Excerpts from Robert Haas's message of dom may 22 23:09:47 -0400 2011:
> On Sun, May 22, 2011 at 10:24 PM, Tom Lane  wrote:
> > Robert Haas  writes:
> >> On Sun, May 22, 2011 at 9:54 PM, Tom Lane  wrote:
> >>> But also, 99.999% of the time
> >>> it would be completely wasted effort because the DBA wouldn't remove the
> >>> postgresql.conf setting at all, ever.
> >
> >> Well, by that argument, we ought not to worry about masterminding what
> >> happens if the DBA does do such a thing -- just run the whole process
> >> and damn the torpedoes.  If it causes a brief database stall, at least
> >> they'll get the correct behavior.
> >
> > Yeah, maybe.  But I don't especially want to document "If you remove a
> > pre-existing setting of TimeZone from postgresql.conf, expect your
> > database to lock up hard for multiple seconds" ... and I think we
> > couldn't responsibly avoid mentioning it.  At the moment that disclaimer
> > reads more like "If you remove a pre-existing setting of TimeZone from
> > postgresql.conf, the database will fall back to a default that might not
> > be what you were expecting".  Is A really better than B?
> 
> Well, I'm not entirely sure, but I lean toward yes.  Anyone else have
> an opinion?

Yes, I think the lock-up is better than weird behavior.  Maybe we should
add a short note in a postgresql.conf comment to this effect, so that it
doesn't surprise anyone that deletes or comments out the line.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [ADMIN] pg_class reltuples/relpages not updated by autovacuum/vacuum

2011-05-26 Thread Kevin Grittner
Robert Haas  wrote:
 
> I think we should really consider replacing reltuples with
> reltupledensity at some point.  I continue to be afraid that using
> a decaying average in this case is going to end up overweighting
> the values from some portion of the table that's getting scanned
> repeatedly, at the expense of other portions of the table that are
> not getting scanned at all.  Now, perhaps experience will prove
> that's not a problem.  But storing relpages and reltupledensity
> separately would give us more flexibility, because we could feel
> free to bump relpages even when we're not sure what to do about
> reltupledensity.  That would help Florian's problem quite a lot,
> even if we did nothing else.
 
Given how trivial it would be to adjust reltuples to keep its ratio
to relpages about the same when we don't have a new "hard" number,
but some evidence that we should fudge our previous value, I don't
see where this change buys us much.  It seems to mostly obfuscate
the fact that we're changing our assumption about how many tuples we
have.  I would rather that we did that explicitly with code comments
about why it's justified than to slip it in the way you suggest.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pre-alloc ListCell's optimization

2011-05-26 Thread Tom Lane
Stephen Frost  writes:
> Basically, I added a ListCell array into the List structure and then
> added a bitmap to keep track of which positions in the array were
> filled.

Hm.  I've gotten the impression from previous testing that there are an
awful lot of extremely short lists (1 or 2 elements) running around in a
typical query.  (One source for those is the argument lists for
operators and functions.)  I'm worried that this type of approach would
bloat the storage required in those cases to a degree that would make
the patch unattractive.  ISTM the first thing we'd need to have before
we could think about this rationally is some measurements about the
frequencies of different List lengths in a typical workload.

When Neil redid the List infrastructure a few years ago, there was some
discussion of special-casing the very first ListCell, and allocating
just that cell along with the List header.  That'd be sort of the
minimal version of what you've done here, and would be guaranteed to
never eat any wasted space (since a list that has a header isn't empty).
We should probably compare the behavior of that minimalistic version to
versions with different sizes of preallocated arrays.

> An alternative approach that I was already considering would be to
> just allocate ListCell's in bulk (kind of a poor-man's slab allocator, I
> believe).  To do that we'd have to make the bitmap be a variable length
> array of bitmaps and then have a list of pointers to the ListCell block
> allocations.  Seems like that's probably overkill for this, however.

That would be pointing in the direction of trying to save space for very
long Lists, which is a use-case that I'm not sure occurs often enough
for us to be worth spending effort on, and in any case is a distinct
issue from that of saving palloc time for very short Lists.  Again, some
statistics about actual list lengths would be really nice to have ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LOCK DATABASE

2011-05-26 Thread Robert Haas
On Thu, May 26, 2011 at 12:28 PM, Ross J. Reedstrom  wrote:
> Perhaps the approach to restricting connections should not be a database
> object lock, but rather an admin function that does the equivalent of
> flipping datallowconn in pg_database?

To me, that seems like a better approach, although it's a little hard
to see how we'd address Alvaro's desire to have it roll back
automatically when the session disconnected.  The disconnect might be
caused by a FATAL error, for example.

I'm actually all in favor of doing more things via SQL rather than
configuration files.  The idea of some ALTER SYSTEM command seems very
compelling to me.  I just don't really like this particular
implementation, which to me seems far too bound up in implementation
details I'd rather not rely on.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [ADMIN] pg_class reltuples/relpages not updated by autovacuum/vacuum

2011-05-26 Thread Robert Haas
On Thu, May 26, 2011 at 12:23 PM, Tom Lane  wrote:
>> Another thought: Couldn't relation_needs_vacanalyze() just scale up
>> reltuples by the ratio of the current number of pages in the relation
>> to relpages, just as the query planner does?
>
> Hmm ... that would fix Florian's immediate issue, and it does seem like
> a good change on its own merits.  But it does nothing for the problem
> that we're failing to put the best available information into pg_class.
>
> Possibly we could compromise on doing just that much in the back
> branches, and the larger change for 9.1?

Do you think we need to worry about the extra overhead of determining
the current size of every relation as we sweep through pg_class?  It's
not a lot, but OTOH I think we'd be doing it once a minute... not sure
what would happen if there were tons of tables.

Going back to your thought upthread, I think we should really consider
replacing reltuples with reltupledensity at some point.  I continue to
be afraid that using a decaying average in this case is going to end
up overweighting the values from some portion of the table that's
getting scanned repeatedly, at the expense of other portions of the
table that are not getting scanned at all.  Now, perhaps experience
will prove that's not a problem.  But storing relpages and
reltupledensity separately would give us more flexibility, because we
could feel free to bump relpages even when we're not sure what to do
about reltupledensity.  That would help Florian's problem quite a lot,
even if we did nothing else.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pre-alloc ListCell's optimization

2011-05-26 Thread Stephen Frost
* Alvaro Herrera (alvhe...@commandprompt.com) wrote:
> I think what this patch is mainly missing is a description of how the
> new allocation is supposed to work, so that we can discuss the details
> without having to reverse-engineer them from the code.

Sure, sorry I didn't include something more descriptive previously.

Basically, I added a ListCell array into the List structure and then
added a bitmap to keep track of which positions in the array were
filled.  I added it as an array simply because makeNode() assumes the
size of a List is static and doesn't call through new_list() or
anything.  When a new ListCell is needed, it'll check if there's an
available spot in the array and use it if there is.  If there's no
more room left, it'll just fall back to doing a palloc() for the
ListCell.  On list_delete(), it'll free up the spot that was used by
that cell.  One caveat is that it won't try to clean up the used spots
on a list_truncate (since you'd have to traverse the entire list to
figure out if anything getting truncated off is using a spot in the
array).  Of course, if you list_truncate to zero, you'll just get NIL
back and the next round through will generate a whole new/empty List
structure for you.

An alternative approach that I was already considering would be to
just allocate ListCell's in bulk (kind of a poor-man's slab allocator, I
believe).  To do that we'd have to make the bitmap be a variable length
array of bitmaps and then have a list of pointers to the ListCell block
allocations.  Seems like that's probably overkill for this, however.
The idea for doing this was to address the case of small lists having to
go through the palloc() process over and over.  We'd be penalizing those
again if we add a lot of complexity so that vary large lists don't have
to palloc() as much.

Thanks

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] about EDITOR_LINENUMBER_SWITCH

2011-05-26 Thread Alvaro Herrera
Excerpts from Tom Lane's message of mié may 25 16:07:55 -0400 2011:
> Alvaro Herrera  writes:
> > Excerpts from Tom Lane's message of mar may 24 17:11:17 -0400 2011:
> >> Right.  It would also increase the cognitive load on the user to have
> >> to remember the command-line go-to-line-number switch for his editor.
> >> So I don't particularly want to redesign this feature.  However, I can
> >> see the possible value of letting EDITOR_LINENUMBER_SWITCH be set from
> >> the same place that you set EDITOR, which would suggest that we allow
> >> the value to come from an environment variable.  I'm not sure whether
> >> there is merit in allowing both that source and ~/.psqlrc, though
> >> possibly for Windows users it might be easier if ~/.psqlrc worked.
> 
> > If we're going to increase the number of options in .psqlrc that do not
> > work with older psql versions, can I please have .psqlrc-MAJORVERSION or
> > some such?  Having 8.3's psql complain all the time because it doesn't
> > understand "linestyle" is annoying.
> 
> 1. I thought we already did have that.

Oh, true, we have that, though it's not very usable because you have to
rename the file from .psqlrc-9.0.3 to .psqlrc-9.0.4 when you upgrade,
which is kinda silly.

> 2. In any case, EDITOR_LINENUMBER_SWITCH isn't a hazard for this,
> because older versions will just think it's a variable without any
> special meaning.

Good point.

> But the real question here is whether we want to change it to be also
> (or instead?) an environment variable.

I vote yes.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pre-alloc ListCell's optimization

2011-05-26 Thread Alvaro Herrera
Excerpts from Stephen Frost's message of mar may 24 22:56:21 -0400 2011:
> Greetings,
> 
>   Someone (*cough*Haas*cough) made a claim over beers at PGCon that it
>   would be very difficult to come up with a way to pre-allocate List
>   entries and maintain the current List API.  I'll admit that it wasn't
>   quite as trivial as I had *hoped*, but attached is a proof-of-concept
>   patch which does it.

I think what this patch is mainly missing is a description of how the
new allocation is supposed to work, so that we can discuss the details
without having to reverse-engineer them from the code.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LOCK DATABASE

2011-05-26 Thread Ross J. Reedstrom
On Thu, May 19, 2011 at 04:13:12PM -0400, Alvaro Herrera wrote:
> Excerpts from Robert Haas's message of jue may 19 15:32:57 -0400 2011:
> > 
> > That's a bit of a self-defeating argument though, since it implies
> > that the effect of taking an exclusive lock via LockSharedObject()
> > will not simply prevent new backends from connecting, but rather will
> > also block any backends already in the database that try to perform
> > one of those operations.
> 
> Well, the database that holds the lock is going to be able to run them,
> which makes sense -- and you probably don't want others doing it, which
> also does.  I mean other backends are still going to be able to run
> administrative tasks like slon and so on, just not modifying the
> database.  If they want to change the comments they can do so after
> you're done with your lock.
> 
> Tom has a point though and so does Chris.  I'm gonna put this topic to
> sleep though, 'cause I sure don't want to be seen like I'm proposing a
> connection pooler in the backend.

I know I'm late to this party, but just wanted to chime in with support
for the idea that access to a particular database is properly in the
scope for a DBA, and it would be good for it not to require
filesystem/sysadmin action. It seems to me to be a proper serverside
support for poolers or shared hosting setups, or other uses cases,
without going to whole hog. Arguably would probably require versions of
pg_cancel_backend and pg_terminate_backend that operate for the database
owner, as well as superuser.

Perhaps the approach to restricting connections should not be a database
object lock, but rather an admin function that does the equivalent of
flipping datallowconn in pg_database?

Ross
-- 
Ross Reedstrom, Ph.D. reeds...@rice.edu
Systems Engineer & Admin, Research Scientistphone: 713-348-6166
Connexions  http://cnx.orgfax: 713-348-3665
Rice University MS-375, Houston, TX 77005
GPG Key fingerprint = F023 82C8 9B0E 2CC6 0D8E  F888 D3AE 810E 88F0 BEDE

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [ADMIN] pg_class reltuples/relpages not updated by autovacuum/vacuum

2011-05-26 Thread Tom Lane
Robert Haas  writes:
> I would feel a lot better about something that is deterministic, like,
> I dunno, if VACUUM visits more than 25% of the table, we use its
> estimate.  And we always use ANALYZE's estimate.  Or something.

This argument seems to rather miss the point.  The data we are working
from is fundamentally not deterministic, and you can't make it so by
deciding to ignore what data we do have.  That leads to a less useful
estimate, not a more useful estimate.

> Another thought: Couldn't relation_needs_vacanalyze() just scale up
> reltuples by the ratio of the current number of pages in the relation
> to relpages, just as the query planner does?

Hmm ... that would fix Florian's immediate issue, and it does seem like
a good change on its own merits.  But it does nothing for the problem
that we're failing to put the best available information into pg_class.

Possibly we could compromise on doing just that much in the back
branches, and the larger change for 9.1?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] "errno" not set in case of "libm" functions (HPUX)

2011-05-26 Thread Tom Lane
Ibrar Ahmed  writes:
> Please find the updated patch. I have added this "+Olibmerrno" compile flag
> check in configure/configure.in file.

I tried this on my HP-UX 10.20 box, and it didn't work very nicely:
configure decided that the compiler accepted +Olibmerrno, so I got a
compile full of
cc: warning 450: Unrecognized option +Olibmerrno.
warnings.  The reason is that PGAC_PROG_CC_CFLAGS_OPT does not pay any
attention to whether the proposed flag generates a warning.  That seems
like a bug --- is there any situation where we'd want to accept a flag
that does generate a warning?  I'm thinking that macro should set
ac_c_werror_flag=yes, the same way PGAC_C_INLINE does.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [ADMIN] pg_class reltuples/relpages not updated by autovacuum/vacuum

2011-05-26 Thread Robert Haas
On Thu, May 26, 2011 at 11:25 AM, Tom Lane  wrote:
> I'm still of the opinion that an incremental estimation process like
> the above is a lot saner than what we're doing now, snarky Dilbert
> references notwithstanding.  The only thing that seems worthy of debate
> from here is whether we should trust ANALYZE's estimates a bit more than
> VACUUM's estimates, on the grounds that the former are more likely to be
> from a random subset of pages.  We could implement that by applying a
> fudge factor when folding a VACUUM estimate into the moving average (ie,
> multiply its reliability by something less than one).  I don't have any
> principled suggestion for just what the fudge factor ought to be, except
> that I don't think "zero" is the best value, which AFAICT is what Robert
> is arguing.  I think Greg's argument shows that "one" is the right value
> when dealing with an ANALYZE estimate, if you believe that ANALYZE saw a
> random set of pages ... but using that for VACUUM does seem
> overoptimistic.

The problem is that it's quite difficult to predict the relative
frequency of full-relation-vacuum, vacuum-with-skips, and ANALYZE
operations on the table will be.  It matters how fast the table is
being inserted into vs. updated/deleted; and it also matters how fast
the table is being updated compared with the system's rate of XID
consumption.  So in general it seems hard to say, well, we know this
number might drift off course a little bit, but there will be a
freezing vacuum or analyze or something coming along soon enough to
fix the problem.  There might be, but it's difficult to be sure.  My
argument isn't so much that using a non-zero value here is guaranteed
to have bad effects, but that we really have no idea what will work
out well in practice, and therefore it seems dangerous to whack the
behavior around ... especially in stable branches.

If we changed this in 9.1, and that's the last time we ever get a
complaint about it, problem solved.  But I would feel bad if we
changed this in the back-branches and then found that, while solving
this particular problem, we had created others.  It also seems likely
that the replacement problems would be more subtle and more difficult
to diagnose, because they'd depend in a very complicated way on the
workload, and having, say, the latest table contents would not
necessarily enable us to reproduce the problem.

I would feel a lot better about something that is deterministic, like,
I dunno, if VACUUM visits more than 25% of the table, we use its
estimate.  And we always use ANALYZE's estimate.  Or something.

Another thought: Couldn't relation_needs_vacanalyze() just scale up
reltuples by the ratio of the current number of pages in the relation
to relpages, just as the query planner does?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [ADMIN] pg_class reltuples/relpages not updated by autovacuum/vacuum

2011-05-26 Thread Tom Lane
Greg Stark  writes:
> On Wed, May 25, 2011 at 9:41 AM, Tom Lane  wrote:
>> ... What I'm currently imagining is
>> to do a smoothed moving average, where we factor in the new density
>> estimate with a weight dependent on the percentage of the table we did
>> scan.  That is, the calculation goes something like
>> 
>> old_density = old_reltuples / old_relpages
>> new_density = counted_tuples / scanned_pages
>> reliability = scanned_pages / new_relpages
>> updated_density = old_density + (new_density - old_density) * reliability
>> new_reltuples = updated_density * new_relpages

> This amounts to assuming that the pages observed in the vacuum have
> the density observed and the pages that weren't seen have the density
> that were previously in the reltuples/relpages stats. That seems like
> a pretty solid approach to me. If the numbers were sane before it
> follows that they should be sane after the update.

Hm, that's an interesting way of looking at it, but I was coming at it
from a signal-processing point of view.  What Robert is concerned about
is that if VACUUM is cleaning a non-representative sample of pages, and
repeated VACUUMs examine pretty much the same sample each time, then
over repeated applications of the above formula the estimated density
will eventually converge to what we are seeing in the sample.  The speed
of convergence depends on the moving-average multiplier, ie the
"reliability" number above, and what I was after was just to slow down
convergence for smaller samples.  So I wouldn't have any problem with
including a fudge factor to make the convergence even slower.  But your
analogy makes it seem like this particular formulation is actually
"right" in some sense.

One other point here is that Florian's problem is really only with our
failing to update relpages.  I don't think there is any part of the
system that particularly cares about reltuples for a toast table.  So
even if the value did converge to some significantly-bad estimate over
time, it's not really an issue AFAICS.  We do care about having a sane
reltuples estimate for regular tables, but for those we should have a
mixture of updates from ANALYZE and updates from VACUUM.  Also, for both
regular and toast tables we will have an occasional vacuum-for-wraparound
that is guaranteed to scan all pages and hence do a hard reset of
reltuples to the correct value.

I'm still of the opinion that an incremental estimation process like
the above is a lot saner than what we're doing now, snarky Dilbert
references notwithstanding.  The only thing that seems worthy of debate
from here is whether we should trust ANALYZE's estimates a bit more than
VACUUM's estimates, on the grounds that the former are more likely to be
from a random subset of pages.  We could implement that by applying a
fudge factor when folding a VACUUM estimate into the moving average (ie,
multiply its reliability by something less than one).  I don't have any
principled suggestion for just what the fudge factor ought to be, except
that I don't think "zero" is the best value, which AFAICT is what Robert
is arguing.  I think Greg's argument shows that "one" is the right value
when dealing with an ANALYZE estimate, if you believe that ANALYZE saw a
random set of pages ... but using that for VACUUM does seem
overoptimistic.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] patch for distinguishing PG instances in event log

2011-05-26 Thread MauMau

Hello,

I wrote and attached a patch for the TODO item below (which I proposed).

Allow multiple Postgres clusters running on the same machine to distinguish 
themselves in the event log

http://archives.postgresql.org/pgsql-hackers/2011-03/msg01297.php
http://archives.postgresql.org/pgsql-hackers/2011-05/msg00574.php

I changed two things from the original proposal.

1. regsvr32.exe needs /n when you specify event source
I described the reason in src/bin/pgevent/pgevent.c.

2. I moved the article for event log registration to more suitable place
The traditional place and what I originally proposed were not best, because 
those who don't build from source won't read those places.


I successfully tested event log registration/unregistration, event logging 
with/without event_source parameter, and SHOWing event_source parameter with 
psql on Windows Vista (32-bit). I would appreciate if someone could test it 
on 64-bit Windows who has the 64-bit environment.


I'll add this patch to the first CommitFest of 9.2. Thank you in advance for 
reviewing it.


Regards
MauMau


multi_event_source.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Should partial dumps include extensions?

2011-05-26 Thread Tom Lane
Peter Eisentraut  writes:
> On tis, 2011-05-24 at 23:26 -0400, Robert Haas wrote:
>> On Tue, May 24, 2011 at 4:44 PM, Tom Lane  wrote:
>>> There's a complaint here
>>> http://archives.postgresql.org/pgsql-general/2011-05/msg00714.php
>>> about the fact that 9.1 pg_dump always dumps CREATE EXTENSION commands
>>> for all loaded extensions.  Should we change that?  A reasonable
>>> compromise might be to suppress extensions in the same cases where we
>>> suppress procedural languages, ie if --schema or --table was used
>>> (see "include_everything" switch in pg_dump.c).

>> Making it work like procedural languages seems sensible to me.

> The same problem still exists for foreign data wrappers, servers, and
> user mappings.  It should probably be changed in the same way.

No objection here, but I'm not going to go do it ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Another attempt at vacuum improvements

2011-05-26 Thread Robert Haas
On Thu, May 26, 2011 at 8:57 AM, Pavan Deolasee
 wrote:
> On Thu, May 26, 2011 at 4:10 PM, Pavan Deolasee
>  wrote:
>> On Thu, May 26, 2011 at 9:40 AM, Robert Haas  wrote:
>>
>>> Currently, I believe the only way a page can get marked all-visible is
>>> by vacuum.  But if we make this change, then it would be possible for
>>> a HOT cleanup to encounter a situation where all-visible could be set.
>>>  We probably want to make that work.
>>>
>>
>> Yes. Thats certainly an option.
>
> BTW, I just realized that this design would expect the visibility map
> to be always correct or at least it should always correctly report a
> page having dead line pointers. We would expect the index vacuum to
> clean  index pointers to *all* dead line pointers because once the
> index vacuum is complete, other backends or next heap vacuum may
> remove any of those old dead line pointers assuming that index vacuum
> would have taken care of the index pointers.
>
> IOW, the visibility map bit must always be clear when there are dead
> line pointers on the page. Do we guarantee that today ? I think we do,
> but the comment in the source file is not affirmative.

It can end up in the wrong state after a crash.  I have a patch to try
to fix that, but I need someone to review it.  (*looks meaningfully at
Heikki, coughs loudly*)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Another attempt at vacuum improvements

2011-05-26 Thread Robert Haas
On Thu, May 26, 2011 at 6:40 AM, Pavan Deolasee
 wrote:
>>> There are some other issues that we should think about too. Like
>>> recording free space  and managing visibility map. The free space is
>>> recorded in the second pass pass today, but I don't see any reason why
>>> that can't be moved to the first pass. Its not clear though if we
>>> should also record free space after retail page vacuum or leave it as
>>> it is.
>>
>> Not sure.  Any idea why it's like that, or why we might want to change it?
>
> I think it precedes the HOT days when the dead space was reclaimed
> only during the second scan. Even post-HOT, if we know we would
> revisit the page anyways during the second scan, it makes sense to
> delay recording free space because the dead line pointers can add to
> it (if they are towards the end of the line pointer array). I remember
> discussing this briefly during HOT, but can't recollect why we decided
> not to update the FSM after retail vacuum. But the entire focus then
> was to keep things simple and that could be one reason.

It's important to keep in mind that page-at-a-time vacuum is happening
in the middle of a routine INSERT/UPDATE/DELETE operation, so we don't
want to do anything too expensive there.  Whether updating the FSM
falls into that category or not, I am not sure.

>> Currently, I believe the only way a page can get marked all-visible is
>> by vacuum.  But if we make this change, then it would be possible for
>> a HOT cleanup to encounter a situation where all-visible could be set.
>>  We probably want to make that work.
>
> Yes. Thats certainly an option.
>
> We did not discuss where to store the information about the start-LSN
> of the last successful index vacuum. I am thinking about a new
> pg_class attribute, just because I can't think of anything better. Any
> suggestion ?

That seems fairly grotty, but I don't have a lot of brilliant ideas.
One possibility that occurred to me was to stick it in the special
space on the first page of the relation.  But that would mean that
every HOT cleanup would need to look at that page, which seems poor.
Even if we cached it after the first access, it still seems kinda
poor.  But it would make the unlogged case easier to handle...  and we
have thought previously about including some metadata in the relation
file itself to help with forensics (which table was this, anyway?).
So I don't know.

> Also for the first version, I wonder if we should let the unlogged and
> temp tables to be handled by the usual two pass vacuum. Once we have
> proven that one pass is better, we will extend that to other tables as
> discussed on this thread.

We can certainly do that for testing.  Whether we want to commit it
that way, I'm not sure.

> Do we need a modified syntax for vacuum, like "VACUUM mytab SKIP
> INDEX" or something similar ? That way, user can just vacuum the heap
> if she wishes so and can also help us with testing.

There's an extensible-options syntax you can use... VACUUM (index off) mytab.

> Do we need more autovacuum tuning parameters to control when to vacuum
> just the heap and when to vacuum the index as well ? Again, we can
> discuss and decide this later, but just wanted to mention this here.

Let's make tuning that a separate effort.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Another attempt at vacuum improvements

2011-05-26 Thread Pavan Deolasee
On Thu, May 26, 2011 at 4:10 PM, Pavan Deolasee
 wrote:
> On Thu, May 26, 2011 at 9:40 AM, Robert Haas  wrote:
>
>> Currently, I believe the only way a page can get marked all-visible is
>> by vacuum.  But if we make this change, then it would be possible for
>> a HOT cleanup to encounter a situation where all-visible could be set.
>>  We probably want to make that work.
>>
>
> Yes. Thats certainly an option.
>

BTW, I just realized that this design would expect the visibility map
to be always correct or at least it should always correctly report a
page having dead line pointers. We would expect the index vacuum to
clean  index pointers to *all* dead line pointers because once the
index vacuum is complete, other backends or next heap vacuum may
remove any of those old dead line pointers assuming that index vacuum
would have taken care of the index pointers.

IOW, the visibility map bit must always be clear when there are dead
line pointers on the page. Do we guarantee that today ? I think we do,
but the comment in the source file is not affirmative.

Thanks,
Pavan

-- 
Pavan Deolasee
EnterpriseDB     http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Should partial dumps include extensions?

2011-05-26 Thread Peter Eisentraut
On tis, 2011-05-24 at 23:26 -0400, Robert Haas wrote:
> On Tue, May 24, 2011 at 4:44 PM, Tom Lane  wrote:
> > There's a complaint here
> > http://archives.postgresql.org/pgsql-general/2011-05/msg00714.php
> > about the fact that 9.1 pg_dump always dumps CREATE EXTENSION commands
> > for all loaded extensions.  Should we change that?  A reasonable
> > compromise might be to suppress extensions in the same cases where we
> > suppress procedural languages, ie if --schema or --table was used
> > (see "include_everything" switch in pg_dump.c).
> 
> Making it work like procedural languages seems sensible to me.

The same problem still exists for foreign data wrappers, servers, and
user mappings.  It should probably be changed in the same way.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Latch implementation that wakes on postmaster death on both win32 and Unix

2011-05-26 Thread Dave Page
On Thu, May 26, 2011 at 11:58 AM, Peter Geoghegan  wrote:
> Attached revision doesn't use any threads or pipes on win32. It's far
> neater there. I'm still seeing that "lagger" process (which is an
> overstatement) at times, so I guess it is normal. On Windows, there is
> no detailed PS output, so I actually don't know what the lagger
> process is, and no easy way to determine that immediately occurs to
> me.

Process Explorer might help you there:
http://technet.microsoft.com/en-us/sysinternals/bb896653



-- 
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Latch implementation that wakes on postmaster death on both win32 and Unix

2011-05-26 Thread Peter Geoghegan
On 26 May 2011 11:22, Heikki Linnakangas
 wrote:
> The Unix-stuff looks good to me at a first glance.

Good.

> There's one reference left to "life sign" in comments. (FWIW, I don't have a
> problem with that terminology myself)

Should have caught that one. Removed.

> Looking at the MSDN docs again, can't you simply include PostmasterHandle in
> the WaitForMultipleObjects() call to have it return when the process dies?
> It should be possible to mix different kind of handles in one call,
> including process handles. Does it not work as advertised?

Uh, I might have done that, had I been aware of PostmasterHandle. I
tried various convoluted ways to make it do what ReadFile() did for
me, before finally biting the bullet and just using ReadFile() in a
separate thread. I've tried adding PostmasterHandle though, and it
works well - it appears to behave exactly the same as my original
implementation.

This simplifies things considerably. Now, on win32, things are
actually simpler than on Unix.

>> You'll see that it takes about a second for the archiver to exit. All
>> processes exit.
>
> Hmm, shouldn't the archiver exit almost instantaneously now that there's no
> polling anymore?

Actually, just one "lagger" process sometimes remains that takes maybe
as long as a second, a bit longer than the others. I assumed that it
was the archiver, but I was probably wrong. I also didn't see that
very consistently.

Attached revision doesn't use any threads or pipes on win32. It's far
neater there. I'm still seeing that "lagger" process (which is an
overstatement) at times, so I guess it is normal. On Windows, there is
no detailed PS output, so I actually don't know what the lagger
process is, and no easy way to determine that immediately occurs to
me.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index e71090f..b1d38f5 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -10150,7 +10150,7 @@ retry:
 	/*
 	 * Wait for more WAL to arrive, or timeout to be reached
 	 */
-	WaitLatch(&XLogCtl->recoveryWakeupLatch, 500L);
+	WaitLatch(&XLogCtl->recoveryWakeupLatch, WL_LATCH_SET | WL_TIMEOUT, 500L);
 	ResetLatch(&XLogCtl->recoveryWakeupLatch);
 }
 else
diff --git a/src/backend/port/unix_latch.c b/src/backend/port/unix_latch.c
index 6dae7c9..fa1d382 100644
--- a/src/backend/port/unix_latch.c
+++ b/src/backend/port/unix_latch.c
@@ -94,6 +94,7 @@
 
 #include "miscadmin.h"
 #include "storage/latch.h"
+#include "storage/pmsignal.h"
 #include "storage/shmem.h"
 
 /* Are we currently in WaitLatch? The signal handler would like to know. */
@@ -108,6 +109,15 @@ static void initSelfPipe(void);
 static void drainSelfPipe(void);
 static void sendSelfPipeByte(void);
 
+/* 
+ * Constants that represent which of a pair of fds given
+ * to pipe() is watched and owned in the context of 
+ * dealing with postmaster death
+ */
+#define POSTMASTER_FD_WATCH 0
+#define POSTMASTER_FD_OWN 1
+
+extern int postmaster_alive_fds[2];
 
 /*
  * Initialize a backend-local latch.
@@ -188,22 +198,22 @@ DisownLatch(volatile Latch *latch)
  * backend-local latch initialized with InitLatch, or a shared latch
  * associated with the current process by calling OwnLatch.
  *
- * Returns 'true' if the latch was set, or 'false' if timeout was reached.
+ * Returns bit field indicating which condition(s) caused the wake-up.
  */
-bool
-WaitLatch(volatile Latch *latch, long timeout)
+int
+WaitLatch(volatile Latch *latch, int wakeEvents, long timeout)
 {
-	return WaitLatchOrSocket(latch, PGINVALID_SOCKET, false, false, timeout) > 0;
+	return WaitLatchOrSocket(latch, wakeEvents, PGINVALID_SOCKET, timeout);
 }
 
 /*
  * Like WaitLatch, but will also return when there's data available in
- * 'sock' for reading or writing. Returns 0 if timeout was reached,
- * 1 if the latch was set, 2 if the socket became readable or writable.
+ * 'sock' for reading or writing.
+ *
+ * Returns bit field indicating which condition(s) caused the wake-up.
  */
 int
-WaitLatchOrSocket(volatile Latch *latch, pgsocket sock, bool forRead,
-  bool forWrite, long timeout)
+WaitLatchOrSocket(volatile Latch *latch, int wakeEvents, pgsocket sock, long timeout)
 {
 	struct timeval tv,
 			   *tvp = NULL;
@@ -211,12 +221,13 @@ WaitLatchOrSocket(volatile Latch *latch, pgsocket sock, bool forRead,
 	fd_set		output_mask;
 	int			rc;
 	int			result = 0;
+	bool		found = false;
 
 	if (latch->owner_pid != MyProcPid)
 		elog(ERROR, "cannot wait on a latch owned by another process");
 
 	/* Initialize timeout */
-	if (timeout >= 0)
+	if (timeout >= 0 && (wakeEvents & WL_TIMEOUT))
 	{
 		tv.tv_sec = timeout / 100L;
 		tv.tv_usec = timeout % 100L;
@@ -224,7 +235,7 @@ WaitLatchOrSocket(volatile Latch *latch, pgsocket sock, bool forRead,
 	}
 
 	waiting = true;
-	for (;;)
+	do
 	{

[HACKERS] Database research papers

2011-05-26 Thread Pavan Deolasee
Just a trivia. I remember spending weeks on reading the ARIES paper
during my post graduation and I loved the depth of knowledge in that
paper.  In fact, if I re-read it again now, I would appreciate it even
more. Are there other papers in the same league, especially which are
more closely related to PostgreSQL implementation ?

http://www.almaden.ibm.com/u/mohan/RJ6649Rev.pdf

Thanks,
Pavan

-- 
Pavan Deolasee
EnterpriseDB     http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Another attempt at vacuum improvements

2011-05-26 Thread Pavan Deolasee
On Thu, May 26, 2011 at 9:40 AM, Robert Haas  wrote:
> On Wed, May 25, 2011 at 11:51 PM, Pavan Deolasee
> Having said that, it doesn't excite me too much because I
>> think we should do the dead line pointer reclaim operation during page
>> pruning and we are already holding cleanup lock at that time and most
>> likely do a reshuffle anyways.
>
> I'll give that a firm maybe.  If there is no reshuffle, then you can
> do this with just an exclusive content lock.  Maybe that's worthless,
> but I'm not certain of it.  I guess we might need to see how the code
> shakes out.
>

Yeah, once we start working on it, we might have a better idea.

> Also, reshuffling might be more expensive.  I agree that if there are
> new dead tuples on the page, then you're going to be paying that price
> anyway; but if not, it might be avoidable.
>

Yeah. We can tackle this later. As you suggested, may be we can start
with something simpler and then see we need to do more.

>
>> There are some other issues that we should think about too. Like
>> recording free space  and managing visibility map. The free space is
>> recorded in the second pass pass today, but I don't see any reason why
>> that can't be moved to the first pass. Its not clear though if we
>> should also record free space after retail page vacuum or leave it as
>> it is.
>
> Not sure.  Any idea why it's like that, or why we might want to change it?
>

I think it precedes the HOT days when the dead space was reclaimed
only during the second scan. Even post-HOT, if we know we would
revisit the page anyways during the second scan, it makes sense to
delay recording free space because the dead line pointers can add to
it (if they are towards the end of the line pointer array). I remember
discussing this briefly during HOT, but can't recollect why we decided
not to update the FSM after retail vacuum. But the entire focus then
was to keep things simple and that could be one reason.

> Currently, I believe the only way a page can get marked all-visible is
> by vacuum.  But if we make this change, then it would be possible for
> a HOT cleanup to encounter a situation where all-visible could be set.
>  We probably want to make that work.
>

Yes. Thats certainly an option.

We did not discuss where to store the information about the start-LSN
of the last successful index vacuum. I am thinking about a new
pg_class attribute, just because I can't think of anything better. Any
suggestion ?

Also for the first version, I wonder if we should let the unlogged and
temp tables to be handled by the usual two pass vacuum. Once we have
proven that one pass is better, we will extend that to other tables as
discussed on this thread.

Do we need a modified syntax for vacuum, like "VACUUM mytab SKIP
INDEX" or something similar ? That way, user can just vacuum the heap
if she wishes so and can also help us with testing.

Do we need more autovacuum tuning parameters to control when to vacuum
just the heap and when to vacuum the index as well ? Again, we can
discuss and decide this later, but just wanted to mention this here.

So are there any other objections/suggestions ? Anyone else cares to
look at the brief design that we discussed above ? Otherwise, I would
go ahead and work on this in the coming days. Of course, I will keep
the list posted about any new issues that I see.

Thanks,
Pavan


-- 
Pavan Deolasee
EnterpriseDB     http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Latch implementation that wakes on postmaster death on both win32 and Unix

2011-05-26 Thread Heikki Linnakangas

On 24.05.2011 23:43, Peter Geoghegan wrote:

Attached is the latest revision of the latch implementation that
monitors postmaster death, plus the archiver client that now relies on
that new functionality and thereby works well without a tight
PostmasterIsAlive() polling loop.


The Unix-stuff looks good to me at a first glance.


The lifesign terminology has been dropped. We now close() the file
descriptor that represents "ownership" - the write end of our
anonymous pipe - in each child backend directly in the forking
machinery (the thin fork() wrapper for the non-EXEC_BACKEND case),
through a call to ReleasePostmasterDeathWatchHandle(). We don't have
to do that on Windows, and we don't.


There's one reference left to "life sign" in comments. (FWIW, I don't 
have a problem with that terminology myself)



Disappointingly, and despite a big effort, there doesn't seem to be a
way to have the win32 WaitForMultipleObjects() call wake on postmaster
death in addition to everything else in the same way that select()
does, so there are now two blocking calls, each in a thread of its own
(when the latch code is interested in postmaster death - otherwise,
it's single threaded as before).

The threading stuff (in particular, the fact that we used a named pipe
in a thread where the name of the pipe comes from the process PID) is
inspired by win32 signal emulation, src/backend/port/win32/signal.c .


That's a pity, all those threads and named pipes are a bit gross for a 
safety mechanism like this.


Looking at the MSDN docs again, can't you simply include 
PostmasterHandle in the WaitForMultipleObjects() call to have it return 
when the process dies? It should be possible to mix different kind of 
handles in one call, including process handles. Does it not work as 
advertised?



You can easily observe that it works as advertised on Windows by
starting Postgres with archiving, using task manager to monitor
processes, and doing the following to the postmaster (assuming it has
a PID of 1234). This is the Windows equivalent of kill -9 :

C:\Users\Peter>taskkill /pid 1234 /F

You'll see that it takes about a second for the archiver to exit. All
processes exit.


Hmm, shouldn't the archiver exit almost instantaneously now that there's 
no polling anymore?


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: Latch implementation that wakes on postmaster death on both win32 and Unix

2011-05-26 Thread Peter Geoghegan
I'm a bit disappointed that no one has commented on this yet. I would
have appreciated some preliminary feedback.

Anyway, I've added it to CommitFest 2011-06.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The way to know whether the standby has caught up with the master

2011-05-26 Thread Fujii Masao
On Wed, May 25, 2011 at 11:07 PM, Tom Lane  wrote:
> Heikki Linnakangas  writes:
>> On 25.05.2011 07:42, Fujii Masao wrote:
>>> To achieve that, I'm thinking to change walsender so that, when the standby
>>> has caught up with the master, it sends back the message indicating that to
>>> the standby. And I'm thinking to add new function (or view like
>>> pg_stat_replication)
>>> available on the standby, which shows that info.
>
>> By the time the standby has received that message, it might not be
>> caught-up anymore because new WAL might've been generated in the master
>> already.
>
> Even assuming that you believe this is a useful capability, there is no
> need to change walsender.  It *already* sends the current-end-of-WAL in
> every message, which indicates precisely whether the message contains
> all of available WAL data.

That's not enough to calculate whether failover is safe or not. Even if the
standby's flush location is equal to the master's current end location, new
WAL might have already been generated, and the "success" indication of
the corresponding transaction might have been returned to the client (this
is possible only when async mode). So in addition to the master's current
end location, the standby must know its sync mode, which walsender would
need to send.

Another problem is that, when we can safely promote the standby, the
standby's flush location isn't always equal to the master's current end
location. Imagine the case where there are some unsent WAL in the master
and corresponding transactions are waiting for replication. In this case,
obviously those locations are not the same. But in sync replication, we can
guarantee that all the committed (from the client's view) transactions have
been replicated to the standby, so failover is safe.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The way to know whether the standby has caught up with the master

2011-05-26 Thread Fujii Masao
On Wed, May 25, 2011 at 3:11 PM, Jaime Casanova  wrote:
> On Wed, May 25, 2011 at 12:28 AM, Fujii Masao  wrote:
>> On Wed, May 25, 2011 at 2:16 PM, Heikki Linnakangas
>>> By the time the standby has received that message, it might not be caught-up
>>> anymore because new WAL might've been generated in the master already.
>>
>> Right. But, thanks to sync rep, until such a new WAL has been replicated to
>> the standby, the commit of transaction is not visible to the client. So, 
>> even if
>> there are some WAL not replicated to the standby, the clusterware can promote
>> the standby safely without any data loss (to the client point of view), I 
>> think.
>
> then, you also need to transmit to the standby if it is the current
> sync standby.

Yes. After further thought, we can promote the standby safely only when the
corresponding walsender meets the following conditions:

1. sync_state is "sync"
2. the standby's flush_location is bigger than or equal to the smallest wait
location in the sync rep queue. Which guarantees that all the committed
transactions (i.e., their "success" indications have been
returned to the
client) have been replicated to the standby.

Once the above conditions get satisfied, the failover is safe until sync_state
is flipped to "async". By using this logic, walsender needs to check whether
failover is safe, and send the message according to the result.

One problem is that, when sync_state is flipped to "async", walsender might
perform replication asynchronously before the standby receives the message
indicating failover is unsafe. In this case, if the master crashes,
the clusterware
would wrongly think that failover is safe and promote the standby despite
which causes data loss.

To solve this problem, walsender would need to send that message
*synchronously*,
i.e., wait for the ACK of the message to arrive from the standby before actually
changing sync_state to "async".

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SSI predicate locking on heap -- tuple or row?

2011-05-26 Thread Heikki Linnakangas

On 26.05.2011 06:19, Kevin Grittner wrote:

Dan and I went around a couple times chasing down all code, comment,
and patch changes needed, resulting in the attached patch.  We found
and fixed the bug which originally manifested in a way which I
confused with a need for row locks, as well as another which was
nearby in the code.  We backed out the changes which were causing
merge problems for Robert, as those were part of the attempt at the
row locking (versus tuple locking).  We removed a function which is
no longer needed.  We adjusted the comments and an affected isolation
test.


Could you explain in the README, why it is safe to only take the lock on 
the visible row version, please? It's not quite obvious, as we've seen 
from this discussion, and if I understood correctly the academic papers 
don't touch that subject either.



As might be expected from removing an unnecessary feature, the lines
of code went down -- a net decrease of 93 lines.


That's the kind of patch I like :-).


These changes generate merge conflicts with the work I've done on
handling CLUSTER, DROP INDEX, etc.  It seems to me that the best
course would be to commit this, then I can rebase the other work and
post it.  Since these issues are orthogonal, it didn't seem like a
good idea to combine them in one patch, and this one seems more
urgent.


Agreed.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers