date:20110323

Re: [HACKERS] crash-safe visibility map, take four

2011-03-23 Thread Gokulakannan Somasundaram

Yeah. i looked at it. I don't think it addresses the problem raised here.

http://archives.postgresql.org/pgsql-hackers/2010-02/msg02097.php

Or may be
i am missing something.

Thanks.

On Wed, Mar 23, 2011 at 7:54 PM, Robert Haas  wrote:

> On Wed, Mar 23, 2011 at 2:29 AM, Gokulakannan Somasundaram
>  wrote:
> >> All operations that clear the bit area are already WAL-logged.
> >>
> > Is it the case with visibility map also?
> > Thanks.
>
> Yes.  Look at the comment that the patch removes.  That describes the
> problem being fixed.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

Re: [HACKERS] Query to generate CREATE INDEX statement from

2011-03-23 Thread Tom Lane

Andrew Hammond  writes:
> I'm building some partitioning support functions. I'm working on writing one
> called clone_indexes_to_partition right now. The idea is to take all the
> indexes applied to the parent and create a matching index on the child. Is
> there existing code that generates a CREATE INDEX statement given an
> indexrelid?

pg_get_indexdef

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Query to generate CREATE INDEX statement from

2011-03-23 Thread Cédric Villemain

2011/3/24 Andrew Hammond :
> I'm building some partitioning support functions. I'm working on writing one
> called clone_indexes_to_partition right now. The idea is to take all the
> indexes applied to the parent and create a matching index on the child. Is
> there existing code that generates a CREATE INDEX statement given
> an indexrelid?

Yes it is :
http://www.postgresql.org/docs/9.0/static/functions-info.html
pg_get_indexdef(index_oid)  get CREATE INDEX command for index

> Andrew



-- 
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] missing history for pg_exec_query_string()

2011-03-23 Thread Jaime Casanova

On Wed, Mar 23, 2011 at 6:50 PM, Cédric Villemain
 wrote:
>
> This :
> git log --pretty=oneline -S'pg_exec_query_string' origin/REL7_4_STABLE
> git show  de28dc9a04c4df5d711815b7a518501b43535a26
>
> give me the answer :
>

ah! ok, so the problema was the way i was asking the info to git...
that -S option looks like a life saver, thanks!

-- 
Jaime Casanova         www.2ndQuadrant.com
Professional PostgreSQL: Soporte y capacitación de PostgreSQL

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Query to generate CREATE INDEX statement from

2011-03-23 Thread Andrew Hammond

I'm building some partitioning support functions. I'm working on writing one
called clone_indexes_to_partition right now. The idea is to take all the
indexes applied to the parent and create a matching index on the child. Is
there existing code that generates a CREATE INDEX statement given an
indexrelid?

Andrew

Re: [HACKERS] missing history for pg_exec_query_string()

2011-03-23 Thread Cédric Villemain

2011/3/23 Jaime Casanova :
> Hi,
>
> I'm looking for the history of pg_exec_query_string() and found that
> it dissapear in 7.4, so i tried to look at "git log" to find out if it
> was renamed or removed completely and found only this 3 commits:
> http://git.postgresql.org/gitweb?p=postgresql.git&a=search&h=cd48aa0e9636c6225d34f366a9d71e7fc118c9f3&st=commit&s=pg_exec_query_string,
> and none of them removes or renames it...
>
> i'm pretty sure exec_simple_query() is what we used to call
> pg_exec_query_string() but i had to look for functions with similar
> names and then compare the code, not very elegant :(
>
> any idea what happens to this part of the history?

This :
git log --pretty=oneline -S'pg_exec_query_string' origin/REL7_4_STABLE
git show  de28dc9a04c4df5d711815b7a518501b43535a26

give me the answer :

commit de28dc9a04c4df5d711815b7a518501b43535a26
Author: Tom Lane 
Date:   Fri May 2 20:54:36 2003 +

Portal and memory management infrastructure for extended query protocol.
Both plannable queries and utility commands are now always executed
within Portals, which have been revamped so that they can handle the
load (they used to be good only for single SELECT queries).  Restructure
code to push command-completion-tag selection logic out of postgres.c,
so that it won't have to be duplicated between simple and extended queries.
initdb forced due to addition of a field to Query nodes.


wonderful git :)

>
> --
> Jaime Casanova         www.2ndQuadrant.com
> Professional PostgreSQL: Soporte y capacitación de PostgreSQL
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>



-- 
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] missing history for pg_exec_query_string()

2011-03-23 Thread Jaime Casanova

Hi,

I'm looking for the history of pg_exec_query_string() and found that
it dissapear in 7.4, so i tried to look at "git log" to find out if it
was renamed or removed completely and found only this 3 commits:
http://git.postgresql.org/gitweb?p=postgresql.git&a=search&h=cd48aa0e9636c6225d34f366a9d71e7fc118c9f3&st=commit&s=pg_exec_query_string,
and none of them removes or renames it...

i'm pretty sure exec_simple_query() is what we used to call
pg_exec_query_string() but i had to look for functions with similar
names and then compare the code, not very elegant :(

any idea what happens to this part of the history?

-- 
Jaime Casanova         www.2ndQuadrant.com
Professional PostgreSQL: Soporte y capacitación de PostgreSQL

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 2nd Level Buffer Cache

2011-03-23 Thread Tom Lane

Robert Haas  writes:
> It looks like the only way anything can ever get put on the free list
> right now is if a relation or database is dropped.  That doesn't seem
> too good.

Why not?  AIUI the free list is only for buffers that are totally dead,
ie contain no info that's possibly of interest to anybody.  It is *not*
meant to substitute for running the clock sweep when you have to discard
a live buffer.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Lack of post creation hook on extension

2011-03-23 Thread Kohei KaiGai

I found a problem that extension.c does not invoke post-creation hook
on its creation time, although no module supports to assign security
label on extension objects right now.

The attached patch tries to fix it.

Thanks,
-- 
KaiGai Kohei 


pgsql-extension-hook.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] PG Session #2 : Call For Papers

2011-03-23 Thread damien clochard

Hi,

The first PostgreSQL Session organized by Dalibo last february was
a big success. More than 80 public and private participants gathered in
Paris.

Dalibo and Oslandia want to carry on this success and set up a
new conference day dedicated to PostGIS, on June 23d in Paris.

http://www.postgresql-sessions.org/en/2/

The call for paper for this second session is now open. Here is some
expected topics (non-exhaustive list) :

  * Feedback on information systems architectures using PostGIS in
specific contexts ("exotic" constraints, high volumes…)
  * decision process of GIS database
  * migration from another spatial DB
  * PostGIS 2.0 new features, WKT Raster
  * Interoperability between PostGIS 2.0 and other softwares
  * Link between PostgreSQL and PostGIS, and planned evolution
  * etc.

Talks duration will be 45', including a 15' questions and answers session.

Thank you for sending the following elements at contact@postgresql-
sessions.org before april 22d :
  * Presentation title
  * Name of speaker(s)
  * Summary of presentation (100-250 words)

For any question, do not hesitate to contact us :
  cont...@postgresql-sessions.org.

If you have any friends or colleagues that you think would be interested
in giving a talk, please forward this message to them!

PS : Slides from the first PG Session are available at :

http://www.postgresql-sessions.org/en/1/


-- 
damien clochard
dalibo.com | dalibo.org

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] psql \dt and table size

2011-03-23 Thread Pavel Stehule

2011/3/23 Alvaro Herrera :
> Excerpts from Robert Haas's message of mié mar 23 17:24:59 -0300 2011:
>> On Mon, Mar 21, 2011 at 1:44 PM, Bernd Helmle  wrote:
>> > It stroke me today again, that \dt+ isn't displaying the acurate table size
>> > for tables, since it uses pg_relation_size() till now. With having
>> > pg_table_size() since PostgreSQL 9.0 available, i believe it would be more
>> > useful to have the total acquired storage displayed, including implicit
>> > objects (the mentioned case where it was not very useful atm was a table
>> > with a big TOAST table).
>>
>> I guess the threshold question for this patch is whether
>> pg_table_size() is a "more accurate" table size or just a different
>> one.
>
> Not including the toast table and index in the size is just plain wrong.
> Reporting the size without the toast objects is an implementation
> artifact that should not be done unless explicitely requested.

+1

can we enhance a detail for table and show more accurate numbers?

table size: xxx
toast size: xxx
indexes size: xxx

Regards

Pavel Stehule


>
> --
> Álvaro Herrera 
> The PostgreSQL Company - Command Prompt, Inc.
> PostgreSQL Replication, Consulting, Custom Development, 24x7 support
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 2nd Level Buffer Cache

2011-03-23 Thread Radosław Smogura

Greg Stark  Wednesday 23 March 2011 21:30:04
> On Wed, Mar 23, 2011 at 8:00 PM, Robert Haas  wrote:
> > It looks like the only way anything can ever get put on the free list
> > right now is if a relation or database is dropped.  That doesn't seem
> > too good.  I wonder if the background writer shouldn't be trying to
> > maintain the free list.  That is, perhaps BgBufferSync() should notice
> > when the number of free buffers drops below some threshold, and run
> > the clock sweep enough to get it back up to that threshold.
> 
> I think this is just a terminology discrepancy. In postgres the free
> list is only used for buffers that contain no useful data at all. The
> only time there are buffers on the free list is at startup or if a
> relation or database is dropped.
> 
> Most of the time blocks are read into buffers that already contain
> other data. Candidate buffers to evict are buffers that have been used
> least recently. That's what the clock sweep implements.
> 
> What the bgwriter's responsible for is looking at the buffers *ahead*
> of the clock sweep and flushing them to disk. They stay in ram and
> don't go on the free list, all that changes is that they're clean and
> therefore can be reused without having to do any i/o.
> 
> I'm a bit skeptical that this works because as soon as bgwriter
> saturates the i/o the os will throttle the rate at which it can write.
> When that happens even a few dozens of milliseconds will be plenty to
> allow the purely user-space processes consuming the buffers to catch
> up instantly.
> 
> But Greg Smith has done a lot of work tuning the bgwriter so that it
> is at least useful in some circumstances. I could well see it being
> useful for systems where latency matters and the i/o is not saturated.

Freelist is almost useless under normal operations, but it's only one check if 
it's empty or not, which could be optimized by checking (...> -1), or !(... < 
0)

Regards,
Radek

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] psql \dt and table size

2011-03-23 Thread Alvaro Herrera

Excerpts from Robert Haas's message of mié mar 23 17:24:59 -0300 2011:
> On Mon, Mar 21, 2011 at 1:44 PM, Bernd Helmle  wrote:
> > It stroke me today again, that \dt+ isn't displaying the acurate table size
> > for tables, since it uses pg_relation_size() till now. With having
> > pg_table_size() since PostgreSQL 9.0 available, i believe it would be more
> > useful to have the total acquired storage displayed, including implicit
> > objects (the mentioned case where it was not very useful atm was a table
> > with a big TOAST table).
> 
> I guess the threshold question for this patch is whether
> pg_table_size() is a "more accurate" table size or just a different
> one.

Not including the toast table and index in the size is just plain wrong.
Reporting the size without the toast objects is an implementation
artifact that should not be done unless explicitely requested.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: making write location work (was: Efficient transaction-controlled synchronous replication)

2011-03-23 Thread Simon Riggs

On Wed, Mar 23, 2011 at 8:20 PM, Robert Haas  wrote:
> On Wed, Mar 23, 2011 at 3:33 PM, Simon Riggs  wrote:
>> In any case, that's not the only argument for keeping it. We introduce
>> the view in this release and I would like it to stay the same from
>> now, since we know we will need that info later.
>
> At least as I understand it, it's not our project policy to carry
> around code that doesn't accomplish anything useful.  I have no
> objection to keeping the field; I simply think that if we're going to
> have it, we should make it work, as in fact it did before you changed
> it without discussion.  You haven't offered any evidence at all that
> it introduces any kind of a performance regression AT ALL, much less
> that such any such regression can't be trivially patched around by
> making SyncRepReleaseWaiters exit quickly if the flush LSN hasn't
> advanced.  The onus is as much on you to justify the change as it is
> on me to justify changing it back.

What a stupid conversation.

There's no onus on me to have to keep justifying to you why the code
is the way it is, but I do.

If you want to make a change that I already know reduces performance,
you have to have a good reason. So far, you don't.

Stop fussing and wrap the release.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 2nd Level Buffer Cache

2011-03-23 Thread Greg Stark

On Wed, Mar 23, 2011 at 8:00 PM, Robert Haas  wrote:
> It looks like the only way anything can ever get put on the free list
> right now is if a relation or database is dropped.  That doesn't seem
> too good.  I wonder if the background writer shouldn't be trying to
> maintain the free list.  That is, perhaps BgBufferSync() should notice
> when the number of free buffers drops below some threshold, and run
> the clock sweep enough to get it back up to that threshold.
>

I think this is just a terminology discrepancy. In postgres the free
list is only used for buffers that contain no useful data at all. The
only time there are buffers on the free list is at startup or if a
relation or database is dropped.

Most of the time blocks are read into buffers that already contain
other data. Candidate buffers to evict are buffers that have been used
least recently. That's what the clock sweep implements.

What the bgwriter's responsible for is looking at the buffers *ahead*
of the clock sweep and flushing them to disk. They stay in ram and
don't go on the free list, all that changes is that they're clean and
therefore can be reused without having to do any i/o.

I'm a bit skeptical that this works because as soon as bgwriter
saturates the i/o the os will throttle the rate at which it can write.
When that happens even a few dozens of milliseconds will be plenty to
allow the purely user-space processes consuming the buffers to catch
up instantly.

But Greg Smith has done a lot of work tuning the bgwriter so that it
is at least useful in some circumstances. I could well see it being
useful for systems where latency matters and the i/o is not saturated.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] psql \dt and table size

2011-03-23 Thread Robert Haas

On Mon, Mar 21, 2011 at 1:44 PM, Bernd Helmle  wrote:
> It stroke me today again, that \dt+ isn't displaying the acurate table size
> for tables, since it uses pg_relation_size() till now. With having
> pg_table_size() since PostgreSQL 9.0 available, i believe it would be more
> useful to have the total acquired storage displayed, including implicit
> objects (the mentioned case where it was not very useful atm was a table
> with a big TOAST table).

I guess the threshold question for this patch is whether
pg_table_size() is a "more accurate" table size or just a different
one.  It could possible be confusing to display one value in that
column when the server is >= 9.0 and the client is >= 9.1, and a
different value when the server is < 9.0 or the client is < 9.1.

On the other hand, it's clear that there are several people in favor
of this change, so maybe we should just go ahead and do it.  Not sure.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: making write location work (was: Efficient transaction-controlled synchronous replication)

2011-03-23 Thread Robert Haas

On Wed, Mar 23, 2011 at 3:33 PM, Simon Riggs  wrote:
> In any case, that's not the only argument for keeping it. We introduce
> the view in this release and I would like it to stay the same from
> now, since we know we will need that info later.

At least as I understand it, it's not our project policy to carry
around code that doesn't accomplish anything useful.  I have no
objection to keeping the field; I simply think that if we're going to
have it, we should make it work, as in fact it did before you changed
it without discussion.  You haven't offered any evidence at all that
it introduces any kind of a performance regression AT ALL, much less
that such any such regression can't be trivially patched around by
making SyncRepReleaseWaiters exit quickly if the flush LSN hasn't
advanced.  The onus is as much on you to justify the change as it is
on me to justify changing it back.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 2nd Level Buffer Cache

2011-03-23 Thread Robert Haas

On Wed, Mar 23, 2011 at 1:53 PM, Jim Nasby  wrote:
> When we started using 192G servers we tried switching our largest OLTP 
> database (would have been about 1.2TB at the time) from 8GB shared buffers to 
> 28GB. Performance went down enough to notice; I don't have any solid metrics, 
> but I'd ballpark it at 10-15%.
>
> One thing that I've always wondered about is the logic of having backends run 
> the clocksweep on a normal basis. OS's that use clock-sweep have a dedicated 
> process to run the clock in the background, with the intent of keeping X 
> amount of pages on the free list. We actually have most of the mechanisms to 
> do that, we just don't have the added process. I believe bg_writer was 
> intended to handle that, but in reality I don't think it actually manages to 
> keep much of anything on the free list. Once we have a performance testing 
> environment I'd be interested to test a modified version that includes a 
> dedicated background clock sweep process that strives to keep X amount of 
> buffers on the free list.

It looks like the only way anything can ever get put on the free list
right now is if a relation or database is dropped.  That doesn't seem
too good.  I wonder if the background writer shouldn't be trying to
maintain the free list.  That is, perhaps BgBufferSync() should notice
when the number of free buffers drops below some threshold, and run
the clock sweep enough to get it back up to that threshold.

On a related note, I've been thinking about whether we could make
bgwriter_delay adaptively self-tuning.  If we notice that we
overslept, we don't sleep as long the next time; if not much happens
while we sleep, we sleep longer the next time.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: making write location work (was: Efficient transaction-controlled synchronous replication)

2011-03-23 Thread Simon Riggs

On Wed, Mar 23, 2011 at 7:29 PM, Robert Haas  wrote:
> On Wed, Mar 23, 2011 at 2:43 PM, Simon Riggs  wrote:
>> On Wed, Mar 23, 2011 at 6:22 PM, Robert Haas  wrote:
>>> On Wed, Mar 23, 2011 at 12:10 PM, Simon Riggs  wrote:
> Specifically, if we're not going to remove write location, then I
> think we need to apply something like the attached.

 The protocol supports different write/fsync values, so the view should
 display them.
>>>
>>> That's exactly the point.
>>
>> No its not.
>>
>>> Currently, we have a protocol that supports
>>> different write and fsync values, but the code as written does not
>>> actually ever send a reply at any time when the two values can ever be
>>> different.  So there is no point in sending both of them.  The write
>>> location is completely redundant with the fsync location and therefore
>>> completely useless.  We shouldn't bother sending the value twice, or
>>> displaying it twice, if it's absolutely 100% guaranteed to be
>>> identical in every case.
>>
>> As of 9.1, we now support other tools that use the protocol, so you
>> cannot assume you know what is being sent, just because one sender has
>> certain characteristics.
>
> Oh, really?  Is this strictly hypothetical or is such a beast
> planned/already in existence?

Ask Magnus.

In any case, that's not the only argument for keeping it. We introduce
the view in this release and I would like it to stay the same from
now, since we know we will need that info later.

No more minor tweaks, please.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: making write location work (was: Efficient transaction-controlled synchronous replication)

2011-03-23 Thread Robert Haas

On Wed, Mar 23, 2011 at 2:43 PM, Simon Riggs  wrote:
> On Wed, Mar 23, 2011 at 6:22 PM, Robert Haas  wrote:
>> On Wed, Mar 23, 2011 at 12:10 PM, Simon Riggs  wrote:
 Specifically, if we're not going to remove write location, then I
 think we need to apply something like the attached.
>>>
>>> The protocol supports different write/fsync values, so the view should
>>> display them.
>>
>> That's exactly the point.
>
> No its not.
>
>> Currently, we have a protocol that supports
>> different write and fsync values, but the code as written does not
>> actually ever send a reply at any time when the two values can ever be
>> different.  So there is no point in sending both of them.  The write
>> location is completely redundant with the fsync location and therefore
>> completely useless.  We shouldn't bother sending the value twice, or
>> displaying it twice, if it's absolutely 100% guaranteed to be
>> identical in every case.
>
> As of 9.1, we now support other tools that use the protocol, so you
> cannot assume you know what is being sent, just because one sender has
> certain characteristics.

Oh, really?  Is this strictly hypothetical or is such a beast
planned/already in existence?

I'm just afraid this is going to be confusing to users who will expect
it to do something that it doesn't.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] psql \dt and table size

2011-03-23 Thread Susanne Ebrecht


Hello Bernd,

On 21.03.2011 18:44, Bernd Helmle wrote:


Attached minor patch extends \dt to use pg_table_size() starting with 
PostgreSQL 9.0, not sure if we backport such changes though. It would 
be interesting for 9.1, however. 


As I already told you:

I tested and it worked.
The code looks correct to me.

You just should send the code to a beauty farm - the wrinkles (braces) 
could get placed better also it could be more. :)


Susanne

--
Susanne Ebrecht - 2ndQuadrant
PostgreSQL Development, 24x7 Support, Training and Services
www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 2nd Level Buffer Cache

2011-03-23 Thread Jim Nasby

On Mar 22, 2011, at 2:53 PM, Robert Haas wrote:
> On Tue, Mar 22, 2011 at 11:24 AM, Jeff Janes  wrote:
>> On Fri, Mar 18, 2011 at 9:19 AM, Robert Haas  wrote:
>>> On Fri, Mar 18, 2011 at 11:14 AM, Kevin Grittner
>>>  wrote:
 Maybe the thing to focus on first is the oft-discussed "benchmark
 farm" (similar to the "build farm"), with a good mix of loads, so
 that the impact of changes can be better tracked for multiple
 workloads on a variety of platforms and configurations.  Without
 something like that it is very hard to justify the added complexity
 of an idea like this in terms of the performance benefit gained.
>>> 
>>> A related area that could use some looking at is why performance tops
>>> out at shared_buffers ~8GB and starts to fall thereafter.
>> 
>> Under what circumstances does this happen?  Can a simple pgbench -S
>> with a large scaling factor elicit this behavior?
> 
> To be honest, I'm mostly just reporting what I've heard Greg Smith say
> on this topic.   I don't have any machine with that kind of RAM.

When we started using 192G servers we tried switching our largest OLTP database 
(would have been about 1.2TB at the time) from 8GB shared buffers to 28GB. 
Performance went down enough to notice; I don't have any solid metrics, but I'd 
ballpark it at 10-15%.

One thing that I've always wondered about is the logic of having backends run 
the clocksweep on a normal basis. OS's that use clock-sweep have a dedicated 
process to run the clock in the background, with the intent of keeping X amount 
of pages on the free list. We actually have most of the mechanisms to do that, 
we just don't have the added process. I believe bg_writer was intended to 
handle that, but in reality I don't think it actually manages to keep much of 
anything on the free list. Once we have a performance testing environment I'd 
be interested to test a modified version that includes a dedicated background 
clock sweep process that strives to keep X amount of buffers on the free list.
--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] making write location work (was: Efficient transaction-controlled synchronous replication)

2011-03-23 Thread Robert Haas

On Wed, Mar 23, 2011 at 12:44 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> Specifically, if we're not going to remove write location, then I
>> think we need to apply something like the attached.
>
>>                       while (walrcv_receive(0, &type, &buf, &len))
>>                               XLogWalRcvProcessMsg(type, buf, len);
>
>> +                     /* Let the master know that we received some data. */
>> +                     XLogWalRcvSendReply();
>
> What if we didn't actually receive any new data?

The portion of the code immediately preceding what's included in the
diff guards against that, and there is a second guard in
XLogWalRcvSendReply().

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: making write location work (was: Efficient transaction-controlled synchronous replication)

2011-03-23 Thread Robert Haas

On Wed, Mar 23, 2011 at 12:10 PM, Simon Riggs  wrote:
>> Specifically, if we're not going to remove write location, then I
>> think we need to apply something like the attached.
>
> The protocol supports different write/fsync values, so the view should
> display them.

That's exactly the point.  Currently, we have a protocol that supports
different write and fsync values, but the code as written does not
actually ever send a reply at any time when the two values can ever be
different.  So there is no point in sending both of them.  The write
location is completely redundant with the fsync location and therefore
completely useless.  We shouldn't bother sending the value twice, or
displaying it twice, if it's absolutely 100% guaranteed to be
identical in every case.

The point of the patch that I posted is that it restores the previous
behavior, where we send an update before flushing WAL and again after
flushing WAL.  If we do that, then the write location can be ahead of
the flush location when we've written but not flushed.  If we don't do
that, and only send replies after flushing everything, then the two
fields are perforce always the same on the master.  I don't see that
as being a useful behavior, and in fact I think it could be quite
confusing.  Someone might assume that if we bother to expose both a
write_location and a flush_location, they are somehow different.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Comments on SQL/Med objects

2011-03-23 Thread Guillaume Lelarge

Le 23/03/2011 17:53, Tom Lane a écrit :
> Robert Haas  writes:
>> On Tue, Mar 22, 2011 at 6:23 PM, Guillaume Lelarge
>>  wrote:
>>> While working on adding support for SQL/Med objects to pgAdmin, I'm
>>> quite surprised to see there is no way to add comments to SQL/Med
>>> objects. Is this on purpose or is it just something that was simply missed?
> 
>> I think it's an oversight.  We should probably fix this.
> 
> Yeah, I had a private TODO about that.  I'd like to see if we can
> refactor the grammar to eliminate some of the duplication there
> as well as the potential for oversights of this sort.  I believe
> that USER MAPPINGs are missing from ObjectType as well as a bunch
> of other basic places ...
> 

OK, great. Thanks for your answers.


-- 
Guillaume
 http://www.postgresql.fr
 http://dalibo.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Comments on SQL/Med objects

2011-03-23 Thread Tom Lane

Robert Haas  writes:
> On Tue, Mar 22, 2011 at 6:23 PM, Guillaume Lelarge
>  wrote:
>> While working on adding support for SQL/Med objects to pgAdmin, I'm
>> quite surprised to see there is no way to add comments to SQL/Med
>> objects. Is this on purpose or is it just something that was simply missed?

> I think it's an oversight.  We should probably fix this.

Yeah, I had a private TODO about that.  I'd like to see if we can
refactor the grammar to eliminate some of the duplication there
as well as the potential for oversights of this sort.  I believe
that USER MAPPINGs are missing from ObjectType as well as a bunch
of other basic places ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] making write location work (was: Efficient transaction-controlled synchronous replication)

2011-03-23 Thread Tom Lane

Robert Haas  writes:
> Specifically, if we're not going to remove write location, then I
> think we need to apply something like the attached.

>   while (walrcv_receive(0, &type, &buf, &len))
>   XLogWalRcvProcessMsg(type, buf, len);
 
> + /* Let the master know that we received some data. */
> + XLogWalRcvSendReply();

What if we didn't actually receive any new data?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: making write location work (was: Efficient transaction-controlled synchronous replication)

2011-03-23 Thread Simon Riggs

On Wed, Mar 23, 2011 at 3:35 PM, Robert Haas  wrote:
> On Fri, Mar 18, 2011 at 8:16 AM, Robert Haas  wrote:
>> On Fri, Mar 18, 2011 at 3:52 AM, Simon Riggs  wrote:
 I agree to get rid of write_location.
>>>
>>> No, don't remove it.
>>>
>>> We seem to be just looking for things to tweak without any purpose.
>>> Removing this adds nothing for us.
>>>
>>> We will have the column in the future, it is there now, so leave it.
>>
>> Well then can we revert the part of your patch that causes it to not
>> actually work any more?
>
> Specifically, if we're not going to remove write location, then I
> think we need to apply something like the attached.

The protocol supports different write/fsync values, so the view should
display them.
We don't know what the standby end will be doing with the data in all cases.

For the main server, making the additional change will just decrease
performance, for no benefit.

In the future we would have a parameter that says how often we send
replies, but there's no point having a parameter if there is only one
meaningful value for standby servers currently.

Please leave this as it is now.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Comments on SQL/Med objects

2011-03-23 Thread Robert Haas

On Tue, Mar 22, 2011 at 6:23 PM, Guillaume Lelarge
 wrote:
> While working on adding support for SQL/Med objects to pgAdmin, I'm
> quite surprised to see there is no way to add comments to SQL/Med
> objects. Is this on purpose or is it just something that was simply missed?

I think it's an oversight.  We should probably fix this.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Efficient transaction-controlled synchronous replication.

2011-03-23 Thread Robert Haas

On Fri, Mar 18, 2011 at 10:10 AM, Robert Haas  wrote:
> On Mon, Mar 7, 2011 at 3:44 AM, Fujii Masao  wrote:
>> On Mon, Mar 7, 2011 at 5:27 PM, Fujii Masao  wrote:
>>> On Mon, Mar 7, 2011 at 7:51 AM, Simon Riggs  wrote:
 Efficient transaction-controlled synchronous replication.
 If a standby is broadcasting reply messages and we have named
 one or more standbys in synchronous_standby_names then allow
 users who set synchronous_replication to wait for commit, which
 then provides strict data integrity guarantees. Design avoids
 sending and receiving transaction state information so minimises
 bookkeeping overheads. We synchronize with the highest priority
 standby that is connected and ready to synchronize. Other standbys
 can be defined to takeover in case of standby failure.

 This version has very strict behaviour; more relaxed options
 may be added at a later date.
>>>
>>> Pretty cool! I'd appreciate very much your efforts and contributions.
>>>
>>> And,, I found one bug ;) You seem to have wrongly removed the check
>>> of max_wal_senders in SyncRepWaitForLSN. This can make the
>>> backend wait for replication even if max_wal_senders = 0. I could produce
>>> this problematic situation in my machine. The attached patch fixes this 
>>> problem.
>>
>>        if (strlen(SyncRepStandbyNames) > 0 && max_wal_senders == 0)
>>                ereport(ERROR,
>>                                (errmsg("Synchronous replication requires WAL 
>> streaming
>> (max_wal_senders > 0)")));
>>
>> The above check should be required also after pg_ctl reload since
>> synchronous_standby_names can be changed by SIGHUP?
>> Or how about just removing that? If the patch I submitted is
>> committed,empty synchronous_standby_names and max_wal_senders = 0
>> settings is no longer unsafe.
>
> This configuration is now harmless in the sense that it no longer
> horribly breaks the entire system, but it's still pretty useless, so
> this might be deemed a valuable sanity check.  However, I'm reluctant
> to leave it in there, because someone could change their config to
> this state, pg_ctl reload, see everything working, and then later stop
> the cluster and be unable to start it back up again.  Since most
> people don't shut their database systems down very often, they might
> not discover that they have an invalid config until much later.  I
> think it's probably not a good idea to have configs that are valid on
> reload but prevent startup, so I'm inclined to either remove this
> check altogether or downgrade it to a warning.

Done.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 2nd Level Buffer Cache

2011-03-23 Thread Radosław Smogura

Merlin Moncure  Tuesday 22 March 2011 23:06:02
> On Tue, Mar 22, 2011 at 4:28 PM, Radosław Smogura
> 
>  wrote:
> > Merlin Moncure  Monday 21 March 2011 20:58:16
> > 
> >> On Mon, Mar 21, 2011 at 2:08 PM, Greg Stark  wrote:
> >> > On Mon, Mar 21, 2011 at 3:54 PM, Merlin Moncure 
> > 
> > wrote:
> >> >> Can't you make just one large mapping and lock it in 8k regions? I
> >> >> thought the problem with mmap was not being able to detect other
> >> >> processes
> >> >> (http://www.mail-archive.com/pgsql-general@postgresql.org/msg122301.h
> >> >> tm l) compatibility issues (possibly obsolete), etc.
> >> > 
> >> > I was assuming that locking part of a mapping would force the kernel
> >> > to split the mapping. It has to record the locked state somewhere so
> >> > it needs a data structure that represents the size of the locked
> >> > section and that would, I assume, be the mapping.
> >> > 
> >> > It's possible the kernel would not in fact fall over too badly doing
> >> > this. At some point I'll go ahead and do experiments on it. It's a bit
> >> > fraught though as it the performance may depend on the memory
> >> > management features of the chipset.
> >> > 
> >> > That said, that's only part of the battle. On 32bit you can't map the
> >> > whole database as your database could easily be larger than your
> >> > address space. I have some ideas on how to tackle that but the
> >> > simplest test would be to just mmap 8kB chunks everywhere.
> >> 
> >> Even on 64 bit systems you only have 48 bit address space which is not
> >> a theoretical  limitation.  However, at least on linux you can map in
> >> and map out pretty quick (10 microseconds paired on my linux vm) so
> >> that's not so big of a deal.  Dealing with rapidly growing files is a
> >> problem.  That said, probably you are not going to want to reserve
> >> multiple gigabytes in 8k non contiguous chunks.
> >> 
> >> > But it's worse than that. Since you're not responsible for flushing
> >> > blocks to disk any longer you need some way to *unlock* a block when
> >> > it's possible to be flushed. That means when you flush the xlog you
> >> > have to somehow find all the blocks that might no longer need to be
> >> > locked and atomically unlock them. That would require new
> >> > infrastructure we don't have though it might not be too hard.
> >> > 
> >> > What would be nice is a mlock_until() where you eventually issue a
> >> > call to tell the kernel what point in time you've reached and it
> >> > unlocks everything older than that time.
> >> 
> >> I wonder if there is any reason to mlock at all...if you are going to
> >> 'do' mmap, can't you just hide under current lock architecture for
> >> actual locking and do direct memory access without mlock?
> >> 
> >> merlin
> > 
> > Actually after dealing with mmap and adding munmap I found crucial thing
> > why to not use mmap:
> > You need to munmap, and for me this takes much time, even if I read with
> > SHARED | PROT_READ, it's looks like Linux do flush or something else,
> > same as with MAP_FIXED, MAP_PRIVATE, etc.
> 
> can you produce small program demonstrating the problem?  This is not
> how things should work AIUI.
> 
> I was thinking about playing with mmap implementation of clog system
> -- it's perhaps better fit.  clog is rigidly defined size, and has
> very high performance requirements.  Also it's much less changes than
> reimplementing heap buffering, and maybe not so much affected by
> munmap.
> 
> merlin

Ah... just one thing, maybe usefull why performance is lost with huge memory. 
I saw mmaped buffers are allocated in something like 0x007, so definitly above 
4gb.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 2nd Level Buffer Cache

2011-03-23 Thread Radosław Smogura

Merlin Moncure  Monday 21 March 2011 20:58:16
> On Mon, Mar 21, 2011 at 2:08 PM, Greg Stark  wrote:
> > On Mon, Mar 21, 2011 at 3:54 PM, Merlin Moncure  
wrote:
> >> Can't you make just one large mapping and lock it in 8k regions? I
> >> thought the problem with mmap was not being able to detect other
> >> processes
> >> (http://www.mail-archive.com/pgsql-general@postgresql.org/msg122301.htm
> >> l) compatibility issues (possibly obsolete), etc.
> > 
> > I was assuming that locking part of a mapping would force the kernel
> > to split the mapping. It has to record the locked state somewhere so
> > it needs a data structure that represents the size of the locked
> > section and that would, I assume, be the mapping.
> > 
> > It's possible the kernel would not in fact fall over too badly doing
> > this. At some point I'll go ahead and do experiments on it. It's a bit
> > fraught though as it the performance may depend on the memory
> > management features of the chipset.
> > 
> > That said, that's only part of the battle. On 32bit you can't map the
> > whole database as your database could easily be larger than your
> > address space. I have some ideas on how to tackle that but the
> > simplest test would be to just mmap 8kB chunks everywhere.
> 
> Even on 64 bit systems you only have 48 bit address space which is not
> a theoretical  limitation.  However, at least on linux you can map in
> and map out pretty quick (10 microseconds paired on my linux vm) so
> that's not so big of a deal.  Dealing with rapidly growing files is a
> problem.  That said, probably you are not going to want to reserve
> multiple gigabytes in 8k non contiguous chunks.
> 
> > But it's worse than that. Since you're not responsible for flushing
> > blocks to disk any longer you need some way to *unlock* a block when
> > it's possible to be flushed. That means when you flush the xlog you
> > have to somehow find all the blocks that might no longer need to be
> > locked and atomically unlock them. That would require new
> > infrastructure we don't have though it might not be too hard.
> > 
> > What would be nice is a mlock_until() where you eventually issue a
> > call to tell the kernel what point in time you've reached and it
> > unlocks everything older than that time.
> 
> I wonder if there is any reason to mlock at all...if you are going to
> 'do' mmap, can't you just hide under current lock architecture for
> actual locking and do direct memory access without mlock?
> 
> merlin
I can't reproduce this. Simple test shows 2x faster read with mmap that 
read();

I'm sending this what I done with mmap (really ugly, but I'm in forest). It is 
read only solution, so init database first with some amount of data (I have 
about 300MB) (2nd level scripts may do this for You).

This what I found:
1. If I not require to put new mmap (mmap with FIXED) in previous region (just 
I do munmap / mmap) with each query, execution time grows, about 10%.

2. Sometimes is enough just to comment or uncomment something that do not have 
side effects on code flow (bufmgr.c; (un)comment some unused if; put NULL, it 
will be replaced), and e.g. query execution time may grow 2x.

3. My initial solution, was 2% faster, about 9ms when reading, now it's 10% 
slower, after making them more usable.

Regards,
Radek


pg_mmap_20110323.patch.bz2
Description: application/bzip

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] making write location work (was: Efficient transaction-controlled synchronous replication)

2011-03-23 Thread Robert Haas

On Fri, Mar 18, 2011 at 8:16 AM, Robert Haas  wrote:
> On Fri, Mar 18, 2011 at 3:52 AM, Simon Riggs  wrote:
>>> I agree to get rid of write_location.
>>
>> No, don't remove it.
>>
>> We seem to be just looking for things to tweak without any purpose.
>> Removing this adds nothing for us.
>>
>> We will have the column in the future, it is there now, so leave it.
>
> Well then can we revert the part of your patch that causes it to not
> actually work any more?

Specifically, if we're not going to remove write location, then I
think we need to apply something like the attached.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


write-location-fix.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Efficient transaction-controlled synchronous replication.

2011-03-23 Thread Robert Haas

On Wed, Mar 23, 2011 at 8:16 AM, Markus Wanner  wrote:
> On 03/23/2011 12:52 PM, Robert Haas wrote:
>> Yes.  What this won't do is let you build a big load-balancing network
>> (at least not without great caution about what you assume).
>
> This sounds too strong to me.  Session-aware load balancing is pretty
> common these days.  It's the default mode of PgBouncer, for example.
> Not much caution required there, IMO.  Or what pitfalls did you have in
> mind?

Well, just the one we were talking about: a COMMIT on one node doesn't
guarantee that the transactions is visible on the other node, just
that it will become visible there eventually, even if a crash happens.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Basic Recovery Control functions for use in Hot Standby. Pause,

2011-03-23 Thread Robert Haas

On Wed, Mar 23, 2011 at 9:38 AM, Simon Riggs  wrote:
>> Actually, my previous email was all nonsense, wasn't it?  If we don't
>> reach the consistency point, we can't enter normal running anyway -
>> shut down is the only option no matter what.
>
> Presumably you mean that the way its currently coded is the way it should 
> stay?

Uh, maybe, but it's not obvious to me that it actually is coded that
way.  I don't see any safeguard that prevents recovery from pausing
before consistency is released.  Is there one?  Where?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] crash-safe visibility map, take four

2011-03-23 Thread Merlin Moncure

On Wed, Mar 23, 2011 at 1:16 AM, Jesper Krogh  wrote:
> On 2011-03-22 21:43, Robert Haas wrote:
>>
>> I took a crack at implementing the first approach described above,
>> which seems to be by far the simplest idea we've come up with to date.
>>  Patch attached.  It doesn't seem to be that complicated, which could
>> mean either that it's not that complicated or that I'm missing
>> something.  Feel free to point and snicker in the latter case.
>
> Looks simple, but there is now benefit on the usage side in the patch,
> so it isn't really "testable" yet? I would love to spend some time testing
> when its doable (even with rough corners.)
>
> I'm still a bit puzzled with how it would end up working with a page-level
> visibillity map bit for index-scans. There is a clear "drop off" in
> usabillity
> when the change rates of the table goes up, which may or may not be
> relevant, but I cannot really judge, since I haven't even got a ballpark
> figure about how much table churn would disable say 50% of the usage.

How much benefit you are going to get is going to be really workload
dependent.  In a lot of cases distribution of writes are going to be
really non uniform so that a small percentage of records get the
majority of the writes across the database generally.  Reliable
PD_ALL_VISIBLE opens the door to optimizing around this pattern, which
i'd estimate the vast majority of databases follow in various degrees.

It's really hard to overemphasize how important in performance terms
are the features that mitigate the relative downsides of our mvcc
implementation.  The  HOT feature in 8.3 was an absolute breakthrough
in terms of postgres performance and I expect this will open similar
doors.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] corner case about replication and shutdown

2011-03-23 Thread Fujii Masao

Hi,

When I read the shutdown code to create the smart shutdown patch for sync rep,
I found the corner case where shutdown can get stuck infinitely. This happens
when postmaster reaches PM_WAIT_BACKENDS state before walsender marks
itself as WAL sender process for streaming WAL (i.e., before walsender calls
MarkPostmasterChildWalSender). In this case,CountChildren(NORMAL) in
PostmasterStateMachine() returns non-zero because normal backend (i.e.,
would-be walsender) is running, and postmaster in PM_WAIT_BACKENDS state
gets out of PostmasterStateMachine(). Then the backend receives
START_REPLICATION command, declares itself as walsender and
CountChildren(NORMAL) returns zero.

The problem is; that declaration doesn't trigger
PostmasterStateMachine() at all.
So, even though there is no normal backends, postmaster cannot call
PostmasterStateMachine() and move its state from PM_WAIT_BACKENDS.

I think this problem is harmless in practice since it doesn't happen
too often. But
that can happen...

The simple fix is to change ServerLoop() so that it periodically calls
PostmasterStateMachine() while shutdown is running. Though I was thinking to
change PostmasterStateMachine(), that looked complicated. Thought?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Basic Recovery Control functions for use in Hot Standby. Pause,

2011-03-23 Thread Simon Riggs

On Wed, Mar 23, 2011 at 11:50 AM, Robert Haas  wrote:
> On Wed, Mar 23, 2011 at 4:51 AM, Fujii Masao  wrote:
>> On Fri, Mar 18, 2011 at 9:31 PM, Robert Haas  wrote:
>>> On Fri, Mar 18, 2011 at 8:27 AM, Heikki Linnakangas
>>>  wrote:
 You could also argue for "log a warning, continue until we can open for Hot
 standby, then pause".
>>>
>>> I don't like that one much.
>>>
 I can write the patch once we know what we want. All of those options sound
 reasonable to me. This is such a corner-case that it doesn't make sense to
 make it user-configurable, though.
>>>
>>> I agree.  Since pause_at_recovery_target is ignored when
>>> hot_standby=off, I think it would be consistent to treat the case
>>> where hot_standby=on but can't actually be initiated the same way -
>>> just ignore the pause request and enter normal running.
>>
>> When hot_standby = on and the recovery target is ahead of the consistent 
>> point,
>> the server doesn't enter normal running since FATAL error happens. So I think
>> that it's more consistent to prevent the server from entering normal
>> running also
>> when hot_standby = off.
>
> Actually, my previous email was all nonsense, wasn't it?  If we don't
> reach the consistency point, we can't enter normal running anyway -
> shut down is the only option no matter what.

Presumably you mean that the way its currently coded is the way it should stay?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] copy / paste object - preliminary part

2011-03-23 Thread Vladimir Kokovic

Hi,

I did a preliminary part of the functions for copy / paste object and now
I expect confirmation that the implementation is correct.

Changed files:
frm/frmMain.cpp, nclude/frm/frmMain.h

New files:
frm/frmPasteObject.cpp, include/frm/frmPasteObject.h

Best regards,
Vladimir Kokovic, DP senior, Belgrade, Serbia


pgadmin3.diff
Description: Binary data
//
//
// pgAdmin III - PostgreSQL Tools
//
// Copyright (C) 2002 - 2011, The pgAdmin Development Team
// This software is released under the PostgreSQL Licence
//
// frmPasteObject.cpp - Copy/Paste object functions
//
//

#include 

#include "pgAdmin3.h"
#include "frm/frmPasteObject.h"
#include "schema/pgSchema.h"
#include "schema/pgTable.h"

frmPasteObject::frmPasteObject(frmMain *form, pgObject *sourceobj, pgObject *targetobj)
{
	this->mainform = form;
	this->sourceobj= sourceobj;
	this->targetobj= targetobj;
}

wxArrayString *getSchemaTables(pgSchema *srcschema)
{
	wxArrayString *objArray = new wxArrayString();

	wxString query = wxT("SELECT relname ")
	 wxT("FROM pg_namespace n ")
	 wxT("LEFT JOIN pg_class c ON n.oid=c.relnamespace AND relkind='r' ")
	 wxT("WHERE nspname='") + srcschema->GetIdentifier() + wxT("'");
	query += wxT("ORDER BY relname");

	pgSet *objects = srcschema->GetDatabase()->ExecuteSet(query);

	if (objects)
	{
		while (!objects->Eof())
		{
			if (!objects->GetVal(wxT("relname")).IsNull())
			{
objArray->Add(objects->GetVal(wxT("relname")));
			}
			objects->MoveNext();
		}
		delete objects;
	}

	return objArray;
}

void frmPasteObject::process()
{
	if (!sourceobj || !targetobj)
	{
		return;
	}

	wxArrayString *srcObjArray;
	pgSchema *targetschema = (pgSchema *)targetobj;
	pgSchema *srcschema = 0;
	pgTable *table = (sourceobj->GetMetaType() == PGM_TABLE) ? (pgTable *)sourceobj : 0;
	if (table)
	{
		wxMessageBox(
			wxT("Paste source table\n") +
			table->GetSchema()->GetDatabase()->GetIdentifier() + wxT(".") + table->GetSchema()->GetIdentifier() + wxT(".") + table->GetIdentifier() + wxT("\n") +
			wxT(" into schema\n") + targetschema->GetDatabase()->GetIdentifier() + wxT(".") + targetschema->GetIdentifier());
	}
	else
	{
		srcschema = (pgSchema *)sourceobj;
		wxMessageBox(
			wxT("Paste source schema objects\n") +
			srcschema->GetDatabase()->GetIdentifier() + wxT(".") + srcschema->GetIdentifier() + wxT("\n") +
			wxT(" into schema\n") +
			targetschema->GetDatabase()->GetIdentifier() + wxT(".") + targetschema->GetIdentifier());
	}

	if (!sourceobj->GetConnection() || !targetobj->GetConnection())
	{
		wxMessageBox(
			_("Both source and target schema connections should be established before paste object operation !"));
		return;
	}

	if (srcschema)
	{
		srcObjArray = ::getSchemaTables(srcschema);
	}
	else
	{
		srcObjArray = new wxArrayString();
		srcObjArray->Add(table->GetIdentifier());
		srcschema = table->GetSchema();
	}

	wxString msg;
	for(unsigned int i = 0; i < srcObjArray->Count(); i++)
	{
		msg = wxT("COPY TABLE:") +
			srcschema->GetDatabase()->GetIdentifier() + wxT(".") + srcschema->GetIdentifier() + wxT(".") + srcObjArray->Item(i) +
			wxT(" INTO:") + targetschema->GetDatabase()->GetIdentifier() + wxT(".") + targetschema->GetIdentifier();
		mainform->GetStatusBar()->SetStatusText(msg, 1);
		//future implementation
	}
	msg = wxString::Format(wxT("%d TABLE(s) COPIED FROM %s TO %s"), srcObjArray->Count(),
		(srcschema->GetDatabase()->GetIdentifier() + wxT(".") + srcschema->GetIdentifier()).c_str(),
		(targetschema->GetDatabase()->GetIdentifier() + wxT(".") + targetschema->GetIdentifier()).c_str());
	mainform->GetStatusBar()->SetStatusText(msg, 1);

	delete srcObjArray;
}

frmPasteObject::~frmPasteObject()
{
}

//
//
// pgAdmin III - PostgreSQL Tools
//
// Copyright (C) 2002 - 2011, The pgAdmin Development Team
// This software is released under the PostgreSQL Licence
//
// frmPasteObject.h - Copy/Paste object functions
//
//

#ifndef FRMPASTEOBJECT_H
#define	FRMPASTEOBJECT_H

#include "frm/frmMain.h"
#include "schema/pgObject.h"

class frmPasteObject
{
public:
	frmPasteObject(frmMain *form, pgObject *sourceobj, pgObject *targetobj);
	void process();
	virtual ~frmPasteObject();
private:
	frmMain *mainform;
	pgObject *sourceobj;
	pgObject *targetobj;
};

#endif	/* FRMPASTEOBJECT_H */


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Efficient transaction-controlled synchronous replication.

2011-03-23 Thread Markus Wanner

On 03/23/2011 12:52 PM, Robert Haas wrote:
> Yes.  What this won't do is let you build a big load-balancing network
> (at least not without great caution about what you assume).

This sounds too strong to me.  Session-aware load balancing is pretty
common these days.  It's the default mode of PgBouncer, for example.
Not much caution required there, IMO.  Or what pitfalls did you have in
mind?

> What it
> will do is make it really, really hard to lose committed transactions.
> Both good things, but different.

..you can still get both at the same time.  At least as long as you are
happy with session-aware load balancing.  And who really needs finer
grained balancing?

(Note that no matter how fine-grained you balance, you are still bound
to a (single core of a) single node.  That changes with distributed
querying, and things really start to get interesting there... but we are
far from that, yet).

Regards

Markus

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] crash-safe visibility map, take four

2011-03-23 Thread Robert Haas

On Wed, Mar 23, 2011 at 2:16 AM, Jesper Krogh  wrote:
> On 2011-03-22 21:43, Robert Haas wrote:
>>
>> I took a crack at implementing the first approach described above,
>> which seems to be by far the simplest idea we've come up with to date.
>>  Patch attached.  It doesn't seem to be that complicated, which could
>> mean either that it's not that complicated or that I'm missing
>> something.  Feel free to point and snicker in the latter case.
>
> Looks simple, but there is now benefit on the usage side in the patch,
> so it isn't really "testable" yet? I would love to spend some time testing
> when its doable (even with rough corners.)

What it probably needs right now is some crash testing - insert a
database panic at various points in the code and then check whether
the state after recovery is still OK.  Also some code review from
people who understand recovery better than me.  *waves to Heikki*

There's a lot more work that will have to be done before this starts
to produce user-visible performance benefits, and then a lot more work
after that before we've exhausted all the possibilities.  I can't cope
with all that right now.  This is basic infrastructure, that will
eventually enable a variety of cool stuff, but isn't particularly sexy
by itself.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] crash-safe visibility map, take four

2011-03-23 Thread Robert Haas

On Wed, Mar 23, 2011 at 2:29 AM, Gokulakannan Somasundaram
 wrote:
>> All operations that clear the bit area are already WAL-logged.
>>
> Is it the case with visibility map also?
> Thanks.

Yes.  Look at the comment that the patch removes.  That describes the
problem being fixed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_ctl restart - behaviour based on wrong instance

2011-03-23 Thread Robert Haas

On Wed, Mar 23, 2011 at 1:48 AM, Fujii Masao  wrote:
> On Sat, Mar 19, 2011 at 10:20 AM, Robert Haas  wrote:
>> On Fri, Mar 18, 2011 at 1:19 PM, Erik Rijkers  wrote:
>>> This is OK and expected.  But then it continues (in the logfile) with:
>>>
>>> FATAL:  lock file "postmaster.pid" already exists
>>> HINT:  Is another postmaster (PID 20519) running in data directory
>>> "/var/data1/pg_stuff/pg_installations/pgsql.vanilla_1/data"?
>>>
>>> So, complaints about the *other* instance.  It doesn't happen once a 
>>> successful start (with pg_ctl
>>> start) has happened.
>>
>> I'm guessing that leftover postmaster.pid contents might be
>> responsible for this?
>
> The cause is that "pg_ctl restart" uses the postmaster.opts which was
> created in the primary. Since its content was something like
> "pg_ctl -D vanilla_1/data", vanilla_1/data/postmaster.pid was checked
> wrongly.
>
> The simple workaround is to exclude postmaster.opts from the backup
> as well as postmaster.pid. But when postmaster.opts doesn't exist,
> "pg_ctl restart" cannot start up the server. We might also need to change
> the code of "pg_ctl restart" so that it does just "pg_ctl start" when
> postmaster.opts doesn't exist.

Sounds reasonable.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Efficient transaction-controlled synchronous replication.

2011-03-23 Thread Robert Haas

On Wed, Mar 23, 2011 at 3:27 AM, Markus Wanner  wrote:
> On 03/22/2011 09:33 PM, Robert Haas wrote:
>> We might have a version of synchronous replication that works this way
>> some day, but it's not the version were shipping with 9.1.  The slave
>> acknowledges the WAL records when they hit the disk (i.e. fsync) not
>> when they are applied; WAL apply can lag arbitrarily.  The point is to
>> guarantee clients that the WAL is on disk somewhere and that it will
>> be replayed in the event of a failover.  Despite the fact that this
>> doesn't work as you're describing, it's a useful feature in its own
>> right.
>
> In that sense, our approach may be more synchronous than most others,
> because after the ACK is sent from the slave, the slave still needs to
> apply the transaction data from WAL before it gets visible, while the
> master needs to wait for the ACK to arrive at its side, before making it
> visible there.
>
> Ideally, these two latencies (disk seek and network induced) are just
> about equal.  But of course, there's no such guarantee.  So whenever one
> of the two is off by an order of magnitude or two (by use case or due to
> a temporary overload), either the master or the slave may lag behind the
> other machine.
>
> What pleases me is that the guarantee from the slave is somewhat similar
> to Postgres-R's: with its ACK, the receiving node doesn't guarantee the
> transaction *is* applied locally, it just guarantees that it *will* be
> able to do so sometime in the future.  Kind of a mind twister, though...

Yes.  What this won't do is let you build a big load-balancing network
(at least not without great caution about what you assume).  What it
will do is make it really, really hard to lose committed transactions.
 Both good things, but different.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Basic Recovery Control functions for use in Hot Standby. Pause,

2011-03-23 Thread Robert Haas

On Wed, Mar 23, 2011 at 4:51 AM, Fujii Masao  wrote:
> On Fri, Mar 18, 2011 at 9:31 PM, Robert Haas  wrote:
>> On Fri, Mar 18, 2011 at 8:27 AM, Heikki Linnakangas
>>  wrote:
>>> You could also argue for "log a warning, continue until we can open for Hot
>>> standby, then pause".
>>
>> I don't like that one much.
>>
>>> I can write the patch once we know what we want. All of those options sound
>>> reasonable to me. This is such a corner-case that it doesn't make sense to
>>> make it user-configurable, though.
>>
>> I agree.  Since pause_at_recovery_target is ignored when
>> hot_standby=off, I think it would be consistent to treat the case
>> where hot_standby=on but can't actually be initiated the same way -
>> just ignore the pause request and enter normal running.
>
> When hot_standby = on and the recovery target is ahead of the consistent 
> point,
> the server doesn't enter normal running since FATAL error happens. So I think
> that it's more consistent to prevent the server from entering normal
> running also
> when hot_standby = off.

Actually, my previous email was all nonsense, wasn't it?  If we don't
reach the consistency point, we can't enter normal running anyway -
shut down is the only option no matter what.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] writing a script to examine dead tuples

2011-03-23 Thread Heikki Linnakangas


On 23.03.2011 01:36, aaronenabs wrote:

Can anyone help me, i am trying to carry out an investigation
which involves accessing dead tuples within the postgresql.

I have been advised i could write a script that allows me to
examine dead tuples and am seeking advise on how to achieve this,
or pointers that can help me going in the right direction. I am
relatively new to this forum and postgresql and have been
assigned postgresql as a topic within my studies which as lead me to
creating a script to perform this task.

If anyone can help me i would be really grateful.


One approach is to use the pageinspect contrib module or pg_filedump to 
view the pages and tuples at a low level. Another is to modify 
PostgreSQL sources to return dead tuples that would normally be skipped.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Replication server timeout patch

2011-03-23 Thread Heikki Linnakangas


On 16.03.2011 11:11, Fujii Masao wrote:

On Wed, Mar 16, 2011 at 4:49 PM, Fujii Masao  wrote:

Agreed. I'll change the patch.


Done. I attached the updated patch.


I don't much like the API for this. Walsender shouldn't need to know 
about the details of the FE/BE protocol, pq_putbytes_if_available() 
seems too low level to be useful.


I think a better API would be to have a non-blocking version of 
pq_putmessage(). We can make the output buffer in pqcomm.c resizeable, 
so that when the message doesn't fit in the output buffer in 
pq_putmessage(), the buffer is enlarged instead of trying to flush it.


Attached is a patch using that approach. This is a much smaller patch, 
and easier to understand. I'm not totally happy with the walsender main 
loop, it seems to work as it is, but the logic has become quite 
complicated. Ideas welcome on how to simplify that.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index e0ebee6..3192ef7 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2019,6 +2019,28 @@ SET ENABLE_SEQSCAN TO OFF;

   
  
+
+ 
+  replication_timeout (integer)
+  
+   replication_timeout configuration parameter
+  
+  
+   
+Specifies the maximum time, in milliseconds, to wait for the reply
+from the standby before terminating replication.  This is useful for
+the primary server to detect the standby crash or network outage.
+A value of zero turns this off.  This parameter can only be set in
+the postgresql.conf file or on the server command line.
+The default value is 60 seconds.
+   
+   
+To make the timeout work properly, 
+must be enabled on the standby, and its value must be less than the
+value of replication_timeout.
+   
+  
+ 
  
 
 
@@ -2216,6 +2238,11 @@ SET ENABLE_SEQSCAN TO OFF;
the postgresql.conf file or on the server command line.
The default value is 10 seconds.
   
+  
+   When  is enabled on the primary,
+   wal_receiver_status_interval must be enabled, and its value
+   must be less than the value of replication_timeout.
+  
   
  
 
diff --git a/src/backend/libpq/pqcomm.c b/src/backend/libpq/pqcomm.c
index 3c7b05b..b6dc8cc 100644
--- a/src/backend/libpq/pqcomm.c
+++ b/src/backend/libpq/pqcomm.c
@@ -56,9 +56,11 @@
  *		pq_putbytes		- send bytes to connection (not flushed until pq_flush)
  *		pq_flush		- flush pending output
  *		pq_getbyte_if_available - get a byte if available without blocking
+ *		pq_flush_if_writable	- flush pending output if writable without blocking
  *
  * message-level I/O (and old-style-COPY-OUT cruft):
  *		pq_putmessage	- send a normal message (suppressed in COPY OUT mode)
+ *		pq_putmessage_noblock - buffer a normal message without blocking (suppressed in COPY OUT mode)
  *		pq_startcopyout - inform libpq that a COPY OUT transfer is beginning
  *		pq_endcopyout	- end a COPY OUT transfer
  *
@@ -92,6 +94,7 @@
 #include "miscadmin.h"
 #include "storage/ipc.h"
 #include "utils/guc.h"
+#include "utils/memutils.h"
 
 /*
  * Configuration options
@@ -108,12 +111,15 @@ static char sock_path[MAXPGPATH];
  * Buffers for low-level I/O
  */
 
-#define PQ_BUFFER_SIZE 8192
+#define PQ_SEND_BUFFER_SIZE 8192
+#define PQ_RECV_BUFFER_SIZE 8192
 
-static char PqSendBuffer[PQ_BUFFER_SIZE];
+static char *PqSendBuffer;
+static int	PqSendBufferSize;
 static int	PqSendPointer;		/* Next index to store a byte in PqSendBuffer */
+static int	PqSendStart;		/* Next index to send a byte in PqSendBuffer */
 
-static char PqRecvBuffer[PQ_BUFFER_SIZE];
+static char PqRecvBuffer[PQ_RECV_BUFFER_SIZE];
 static int	PqRecvPointer;		/* Next index to read a byte from PqRecvBuffer */
 static int	PqRecvLength;		/* End of data available in PqRecvBuffer */
 
@@ -142,7 +148,9 @@ static int	Setup_AF_UNIX(void);
 void
 pq_init(void)
 {
-	PqSendPointer = PqRecvPointer = PqRecvLength = 0;
+	PqSendBufferSize = PQ_SEND_BUFFER_SIZE;
+	PqSendBuffer = MemoryContextAlloc(TopMemoryContext, PqSendBufferSize);
+	PqSendPointer = PqSendStart = PqRecvPointer = PqRecvLength = 0;
 	PqCommBusy = false;
 	DoingCopyOut = false;
 	on_proc_exit(pq_close, 0);
@@ -762,7 +770,7 @@ pq_recvbuf(void)
 		int			r;
 
 		r = secure_read(MyProcPort, PqRecvBuffer + PqRecvLength,
-		PQ_BUFFER_SIZE - PqRecvLength);
+		PQ_RECV_BUFFER_SIZE - PqRecvLength);
 
 		if (r < 0)
 		{
@@ -1138,10 +1146,10 @@ internal_putbytes(const char *s, size_t len)
 	while (len > 0)
 	{
 		/* If buffer is full, then flush it out */
-		if (PqSendPointer >= PQ_BUFFER_SIZE)
+		if (PqSendPointer >= PqSendBufferSize)
 			if (internal_flush())
 return EOF;
-		amount = PQ_BUFFER_SIZE - PqSendPointer;
+		amount = PqSendBufferSize - PqSendPointer;
 		if (amount > len)
 			amount = len;
 		memcpy(PqSendBuffer + PqSendPointer, s, amount);
@@ -1172,12

Re: [HACKERS] Planner regression in 9.1: min(x) cannot use partial index with NOT NULL

2011-03-23 Thread Marti Raudsepp

On Tue, Mar 22, 2011 at 01:02, Tom Lane  wrote:
> I studied the code some more, and I think this probably can be made to
> work.  The basic idea is to have preprocess_minmax_aggregates build
> simplified queries like the above (working by modifying the query tree
> that exists at the point where it's called) and call query_planner on
> them.  Save aside the resulting path data, then let the regular planning
> process continue.  When optimize_minmax_aggregates is called, see
> whether the regular plan is cheaper than the sum of the path costs.
> If not, use the paths to construct a replacement plan, same as now.

Thanks a lot! I can confirm that this is fixed now in git version, and
now also works with partitioned tables, which is great news.

Regards,
Marti

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Sync Rep v19

2011-03-23 Thread Fujii Masao

On Sat, Mar 19, 2011 at 11:28 AM, Robert Haas  wrote:
> On Fri, Mar 18, 2011 at 10:25 PM, Robert Haas  wrote:
>> On Tue, Mar 8, 2011 at 7:05 AM, Fujii Masao  wrote:
>>> * Smart shutdown
>>> Smart shutdown should wait for all the waiting backends to be acked, and
>>> should not cause them to forcibly exit. But this leads shutdown to get stuck
>>> infinitely if there is no walsender at that time. To enable them to be acked
>>> even in that situation, we need to change postmaster so that it accepts the
>>> replication connection even during smart shutdown (until we reach
>>> PM_SHUTDOWN_2 state). Postmaster has already accepted the superuser
>>> connection to cancel backup during smart shutdown. So I don't think that
>>> the idea to accept the replication connection during smart shutdown is so
>>> ugly.
>>>
>>> * Fast shutdown
>>> I agree with you about fast shutdown. Fast shutdown should cause all the
>>> backends including waiting ones to exit immediately. At that time, the
>>> non-acked backend should not return the success, according to the
>>> definition of sync rep. So we need to change a backend so that it gets rid
>>> of itself from the waiting queue and exits before returning the success,
>>> when it receives SIGTERM. This change leads the waiting backends to
>>> do the same even when pg_terminate_backend is called. But since
>>> they've not been acked yet, it seems to be reasonable to prevent them
>>> from returning the COMMIT.
>>>
>>> Comments? I'll create the patch barring objection.
>>
>> The fast smart shutdown part of this problem has been addressed.  The
>
> Ugh.  I mean "the fast shutdown", of course, not "the fast smart
> shutdown".  Anyway, point is:
>
> fast shutdown now OK
> smart shutdown still not OK
> do you want to write a patch?
>
> :-)
>
>> smart shutdown case still needs work, and I think the consensus was
>> that your proposal above was the best way to go with it.
>>
>> Do you still want to work up a patch for this?  If so, I can review.

Sure. Will do.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Basic Recovery Control functions for use in Hot Standby. Pause,

2011-03-23 Thread Fujii Masao

On Fri, Mar 18, 2011 at 9:31 PM, Robert Haas  wrote:
> On Fri, Mar 18, 2011 at 8:27 AM, Heikki Linnakangas
>  wrote:
>> You could also argue for "log a warning, continue until we can open for Hot
>> standby, then pause".
>
> I don't like that one much.
>
>> I can write the patch once we know what we want. All of those options sound
>> reasonable to me. This is such a corner-case that it doesn't make sense to
>> make it user-configurable, though.
>
> I agree.  Since pause_at_recovery_target is ignored when
> hot_standby=off, I think it would be consistent to treat the case
> where hot_standby=on but can't actually be initiated the same way -
> just ignore the pause request and enter normal running.

When hot_standby = on and the recovery target is ahead of the consistent point,
the server doesn't enter normal running since FATAL error happens. So I think
that it's more consistent to prevent the server from entering normal
running also
when hot_standby = off.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Efficient transaction-controlled synchronous replication.

2011-03-23 Thread Markus Wanner

On 03/22/2011 09:33 PM, Robert Haas wrote:
> We might have a version of synchronous replication that works this way
> some day, but it's not the version were shipping with 9.1.  The slave
> acknowledges the WAL records when they hit the disk (i.e. fsync) not
> when they are applied; WAL apply can lag arbitrarily.  The point is to
> guarantee clients that the WAL is on disk somewhere and that it will
> be replayed in the event of a failover.  Despite the fact that this
> doesn't work as you're describing, it's a useful feature in its own
> right.

In that sense, our approach may be more synchronous than most others,
because after the ACK is sent from the slave, the slave still needs to
apply the transaction data from WAL before it gets visible, while the
master needs to wait for the ACK to arrive at its side, before making it
visible there.

Ideally, these two latencies (disk seek and network induced) are just
about equal.  But of course, there's no such guarantee.  So whenever one
of the two is off by an order of magnitude or two (by use case or due to
a temporary overload), either the master or the slave may lag behind the
other machine.

What pleases me is that the guarantee from the slave is somewhat similar
to Postgres-R's: with its ACK, the receiving node doesn't guarantee the
transaction *is* applied locally, it just guarantees that it *will* be
able to do so sometime in the future.  Kind of a mind twister, though...

Regards

Markus

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

51 matches

Mail list logo