date:20101006

Re: [HACKERS] Perf regression in 2.6.32 (Ubuntu 10.04 LTS)

2010-10-06 Thread Greg Smith


Robert Haas wrote:

Greg, have you run into any other evidence suggesting a problem with 2.6.32?
  


I haven't actually checked myself yet.  Right now the only distribution 
shipping 2.6.32 usefully is Ubuntu 10.04, which I can't recommend anyone 
use on a server because their release schedules are way too aggressive 
to ever deliver stable versions anymore.  So until either RHEL6 or 
Debian Squeeze ships, very later this year or early next, the 
performance of 2.6.32 is irrelevant to me.  And by then I'm hoping that 
the early adopters have squashed more of the obvious bugs here.  2.6.32 
is 11 months old at this point, which makes it still a bleeding edge 
kernel in my book.


--
Greg Smith, 2ndQuadrant US g...@2ndquadrant.com Baltimore, MD
PostgreSQL Training, Services and Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] todo point: plpgsql - scrollable cursors are supported

2010-10-06 Thread Pavel Stehule

2010/10/7 Robert Haas :
> On Mon, Oct 4, 2010 at 2:52 AM, Pavel Stehule  wrote:
>> I am thinking so you can remove a "scrollable cursor support" from
>> ToDo for plpgsql. Scrollable cursors are supported and supported
>> syntax are same as core SQL language.
>
> I agree, removed.  I also removed WITH HOLD cursors, which we seem to
> have as well.

I think so we doesn't support WITH HOLD cursor syntax yet. Maybe we
have similar functionality. Don't know.

Pavel

>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise Postgres Company
>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] leaky views, yet again

2010-10-06 Thread Heikki Linnakangas


On 07.10.2010 06:39, Robert Haas wrote:

On Tue, Oct 5, 2010 at 3:42 PM, Tom Lane  wrote:

Right, *column* filtering seems easy and entirely secure.  The angst
here is about row filtering.  Can we have a view in which users can see
the values of a column for some rows, with perfect security that they
can't identify values for the hidden rows?  The stronger form is that
they shouldn't even be able to tell that hidden rows exist, which is
something your view doesn't try to do; but there are at least some
applications where that would be desirable.


I took a crack at documenting the current behavior; see attached.


Looks good. It gives the impression that you need to be able to a create 
custom function to exploit, though. It would be good to mention that 
internal functions can be used too, revoking access to CREATE FUNCTION 
does not make you safe.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] host name support in pg_hba.conf

2010-10-06 Thread KaiGai Kohei

(2010/10/06 10:21), KaiGai Kohei wrote:
> I'll check the patch for more details, please wait for a few days.

I could find out some matters in this patch, independent from the
discussion of localhost. (About pg_hba.conf.sample, I'm sorry for
the missuggestion. Please fix up it according to Tom's comment.)

* The logic is still unclear for me.

The check_hostname() immediately returns with false, if the resolved
remote hostname is NOT matched with the hostname described in pg_hba.conf.

| +static bool
| +check_hostname(hbaPort *port, const char *hostname)
| +{
| +   struct addrinfo *gai_result, *gai;
| +   int ret;
| +   boolfound;
| +
| +   /* Lookup remote host name if not already done */
| +   if (!port->remote_hostname)
| +   {
| +   charremote_hostname[NI_MAXHOST];
| +
| +   if (pg_getnameinfo_all(&port->raddr.addr, port->raddr.salen,
| +  remote_hostname, sizeof(remote_hostname),
| +  NULL, 0,
| +  0))
| +   return false;
| +
| +   port->remote_hostname = strdup(remote_hostname);
| +   }
| +
| +   if (strcmp(port->remote_hostname, hostname) != 0)
| +   return false;
| +
| +   /* Lookup IP from host name and check against original IP */

However, it seems to me you expected an opposite behavior.

If the resolved hostname is matched with the hostname described
in pg_hba.conf, we can consider this HbaLine to be a suitable
configuration without any fallbacks. Right?
It so, it should be as follows:

if (strcmp(port->remote_hostname, hostname) == 0)
return true;

In addition, we should go the rest of fallback code on mismatch
cases only, don't we?

* Why getnameinfo() in the fallback loop?

At the fallback code when the hostname was matched (I believe this code
is intended to handle the case when hostname was NOT matched.) calls
getnameinfo() for each candidate of remote addresses.
But its result is not referenced by anybody. Is it really necessary?

| +   found = false;
| +   for (gai = gai_result; gai; gai = gai->ai_next)
| +   {
| +   charhostinfo[NI_MAXHOST];
| +
| +   getnameinfo(gai->ai_addr, gai->ai_addrlen,
| +   hostinfo, sizeof(hostinfo),
| +   NULL, 0,
| +   NI_NUMERICHOST);
| +
| +   if (gai->ai_addr->sa_family == port->raddr.addr.ss_family)
| +   {
| +   if (gai->ai_addr->sa_family == AF_INET)
| +   {
| +   if (ipv4eq((struct sockaddr_in *) gai->ai_addr,
| +  (struct sockaddr_in *) &port->raddr.addr))
| +   {
| +   found = true;
| +   break;
| +   }
| +   }
| +   else if (gai->ai_addr->sa_family == AF_INET6)
| +   {
| +   if (ipv6eq((struct sockaddr_in6 *) gai->ai_addr,
| +  (struct sockaddr_in6 *) &port->raddr.addr))
| +   {
| +   found = true;
| +   break;
| +   }
| +   }
| +   }
| +   }
| +
| +   if (gai_result)
| +   freeaddrinfo(gai_result);
| +
| +   return found;
| +}

* Slash ('/') after the hostname

At the parse_hba_line(), the parsed token which contains either
hostname or cidr address is sliced into two parts on the first '/'
character, if exist.
Then, even if cidr_slash is not NULL, it shall be ignored when
top-half of the token is hostname, not numeric address.

| else
| {
| /* IP and netmask are specified */
| parsedline->ip_cmp_method = ipCmpMask;
|
| /* need a modifiable copy of token */
| token = pstrdup(token);
|
| /* Check if it has a CIDR suffix and if so isolate it */
| cidr_slash = strchr(token, '/');
| if (cidr_slash)
| *cidr_slash = '\0';
  :
| ret = pg_getaddrinfo_all(token, NULL, &hints, &gai_result);
| -   if (ret || !gai_result)
| +   if (ret == 0 && gai_result)
| +   memcpy(&parsedline->addr, gai_result->ai_addr,
| +  gai_result->ai_addrlen);
| +   else if (ret == EAI_NONAME)
| +   parsedline->hostname = token;
| +   else
| {

It allows the following configuration works without any errors.
(In fact, it works for me.)

  # IPv4/6 local connections:
  host  all  all  kaigai.myhome.cx/today_is_sunny  trust

It seems to me, we should raise an error, if both of cidr_slash and
parsedline->hostname are not NULL.

Thanks,
-- 
KaiGai Kohei 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] leaky views, yet again

2010-10-06 Thread Robert Haas

On Tue, Oct 5, 2010 at 3:42 PM, Tom Lane  wrote:
> Right, *column* filtering seems easy and entirely secure.  The angst
> here is about row filtering.  Can we have a view in which users can see
> the values of a column for some rows, with perfect security that they
> can't identify values for the hidden rows?  The stronger form is that
> they shouldn't even be able to tell that hidden rows exist, which is
> something your view doesn't try to do; but there are at least some
> applications where that would be desirable.

I took a crack at documenting the current behavior; see attached.  It
turns out that a view which only uses boolean operators in the WHERE
clause is not obviously subvertable, because we judge those operations
to have no cost.  (It seems unwise to rely on this for security,
though.)  Anything more complicated - that does row filtering - is
easily hacked.  See within for details.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

document-leaky-views.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: Fix snapshot taking inconsistencies

2010-10-06 Thread Steve Singer


On Mon, 4 Oct 2010, Marko Tiikkaja wrote:


the patch does, modules start behaving weirdly.  So what I'm suggesting is:

 - Deprecate pg_parse_and_rewrite().  I have no idea how the project
   has done this in the past, but grepping the source code for
   "deprecated" suggests that we just remove the function.

 - Introduce a new function, specifically designed for SQL functions.
   Both callers of pg_parse_and_rewrite (init_sql_fcache and
   fmgr_sql_validator) call check_sql_fn_retval after
   pg_parse_and_rewrite so I think we could merge those into the new
   function.

Does anyone have any concerns about this?  Better ideas?


The only comment I've seen on this was from Tom and his only concern was 
that old code wouldn't be able to compile against a new version of the 
function.  What you propose (delete pg_parse_and_rewrite) and replacing it 
with a function of the new name sounds fine.


Since no one else has proposed a better idea and the commit fest is ticking 
away I think you should go ahead and do that.







Regards,
Marko Tiikkaja


Steve


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] todo point: plpgsql - scrollable cursors are supported

2010-10-06 Thread Robert Haas

On Mon, Oct 4, 2010 at 2:52 AM, Pavel Stehule  wrote:
> I am thinking so you can remove a "scrollable cursor support" from
> ToDo for plpgsql. Scrollable cursors are supported and supported
> syntax are same as core SQL language.

I agree, removed.  I also removed WITH HOLD cursors, which we seem to
have as well.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Bug / shortcoming in has_*_privilege

2010-10-06 Thread KaiGai Kohei

(2010/10/07 2:05), Alvaro Herrera wrote:
>>> Another thing that could raise eyebrows is that I chose to remove the
>>> "missing_ok" argument from get_role_oid_or_public, so it's not a perfect
>>> mirror of it.  None of the current callers need it, but perhaps people
>>> would like these functions to be consistent.
>>>
>> Tom Lane suggested to add missing_ok argument, although it is not a must-
>> requirement.
> 
> Actually I think he suggested the opposite.
> 
Ahh, I understood his suggestion as literal.

Well, I'd like to mark this patch as 'ready for committer'.

Thanks,
-- 
KaiGai Kohei 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] security hook on table creation

2010-10-06 Thread KaiGai Kohei


(2010/10/07 6:21), Alvaro Herrera wrote:

Excerpts from Robert Haas's message of mié oct 06 17:02:22 -0400 2010:

2010/10/5 KaiGai Kohei:



However, we also have a few headache cases.
DefineType() creates a new type object and its array type, but it does not
call CommandCounterIncrement() by the end of this function, so the new type
entries are not visible from the plugin modules, even if we put a security
hook at tail of the DefineType().
DefineFunction() also has same matter. It create a new procedure object,
but it also does not call CommandCounterIncrement() by the end of this
function, except for the case when ProcedureCreate() invokes language
validator function.


So I guess the first question here is why it's important to be able to
see the new entry.  I am thinking that you want it so that, for
example, you can fetch the namespace OID to perform an SE-Linux type
transition.  Is that right?


I'm not sure that there's any point trying to optimize these to the
point of avoiding CommandCounterIncrement.  Surely DefineType et al are
not performance-sensitive operations.


Performance optimization is not the point here.

If we need to call CommandCounterIncrement() before invocation of security
hooks, we also need to analyze its side-effect and to confirm it is harmless.
Even if it is harmless, I think it gives us more burden of code maintenance
than the idea of two hooks on creation.


Maybe we need a variant of InvokeObjectAccessHook that does a CCI only
if a hook is present.


The problem I see with this idea is that it becomes a lot harder to
track down whether it ocurred or not for any given operation.


Yes, it is a burden of code maintenance.

Thanks,
--
KaiGai Kohei 

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] security hook on table creation

2010-10-06 Thread KaiGai Kohei

(2010/10/07 6:02), Robert Haas wrote:
> 2010/10/5 KaiGai Kohei:
>> I began to revise the security hooks, but I could find a few cases that does
>> not work with the approach of post-creation security hooks.
>> If we try to fix up the core PG routine to become suitable for the 
>> post-creation
>> security hooks, it shall need to put more CommandCounterIncrement() on 
>> various
>> places, so it made me doubtful whether this approach gives really minimum 
>> impact
>> to the core PG routine, or not.
> 
> We definitely don't want to add CCIs without a good reason.
> 
>> Some of object classes have CommandCounterIncrement() just after the routine
>> to create a new object. For example, DefineRelation() calls it just after the
>> heap_create_with_catalog(), so the new relation entry is visible for plugin
>> modules using SearchSysCache(), as long as we put the post-creation security
>> hook aftere the CommandCounterIncrement().
>>
>> However, we also have a few headache cases.
>> DefineType() creates a new type object and its array type, but it does not
>> call CommandCounterIncrement() by the end of this function, so the new type
>> entries are not visible from the plugin modules, even if we put a security
>> hook at tail of the DefineType().
>> DefineFunction() also has same matter. It create a new procedure object,
>> but it also does not call CommandCounterIncrement() by the end of this
>> function, except for the case when ProcedureCreate() invokes language
>> validator function.
> 
> So I guess the first question here is why it's important to be able to
> see the new entry.  I am thinking that you want it so that, for
> example, you can fetch the namespace OID to perform an SE-Linux type
> transition.  Is that right?
> 
We assumed that namespace OID can be fetched from the entry of pg_class,
so we thought the common InvokeObjectAccessHook() dose not need to take
many arguments, because we can pull out corresponding properties of new
object (such as namespace OID) from SysCache using OID of new object.

So, InvokeObjectAccessHook() must deliver OID of the namespace, rather
than OID of the new object, if it is not visible.

> Maybe we need a variant of InvokeObjectAccessHook that does a CCI only
> if a hook is present.  I can't see that we're going to want to pay for
> that unconditionally, but it shouldn't surprise anyone that loading an
> enhanced security provider comes with some overhead.
> 
The reason why we tried to move the object creation hooks into post
object creation was to reduce number of security hooks and burden of
code maintenance.

However, it seems to me paying attention for object visibility gives
us more burden rather than we have multiple creation hooks.

>> E.g, it may be possible to design creation of relation as follows:
>>
>> DefineRelation(...)
>> {
>> /* DAC permission checks here */
>> :
>> /* MAC permission checks; it returns its private data */
>> opaque = check_relation_create(..);
>> :
>> /* insertion into pg_class catalog */
>> relationId = heap_create_with_catalog(...);
>> :
>> /* assign security labels on the new object */
>> InvokeObjectAccessHook(OBJECT_TABLE, OAT_POST_CREATE,
>>relationId, 0, opaque);
>> }
> 
> I'd like to try to avoid that, if we can.  That's going to make this
> code far more complex to understand and maintain.
> 
Against our feeling, I consider the idea of two hooks help us to
understand and maintain security hooks in the future.
Because the place where we should put the prep-creation hook is
just after the default privilege checks, it is quite obvious.

If we would have only post-creation hook, is it obvious where we
should put the security hook on function creation, for example?

In the case when we have two hooks, obviously, the prep-creation
hook will be after the DAC checks, and the post-creation hook will
be after the insert/update of system catalogs.
It seems to me quite easy to understand and maintain.

Thanks,
-- 
KaiGai Kohei 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch: tsearch - some memory diet

2010-10-06 Thread Tom Lane

Robert Haas  writes:
> Nice.  What was the overall effect on memory consumption?

Before:
cspell: 31352608 total in 3814 blocks; 37432 free (6 chunks); 31315176 used

After:
cspell: 16214816 total in 1951 blocks; 13744 free (12 chunks); 16201072 used

This is on a 32-bit machine that uses MAXALIGN 8, and I also had
enable_cassert on (hence extra palloc chunk overhead) so it may be
overstating the amount of savings you'd see in production.  But it's
in the same ballpark as what Pavel reported originally.  AFAICT
practically all of the useful savings comes from the one place he
targeted originally, and the other changes are just marginal gravy.

Before I throw it away, here's some data about the allocations that
go through that code on the Czech dictionary.  First column is number
of calls of the given size, second is requested size in bytes:

   1 1
 695 2
1310 3
1965 4
2565 5
1856 6
 578 7
 301 8
   7 9
   2 10
707733 12
 520 16
107035 20
  16 24
22606 28
   3 32
8814 36
  59 40
4305 44
   2 48
2238 52
   2 56
1236 60
  20 64
 816 68
 599 76
   1 80
 408 84
   9 88
 334 92
   2 96
 246 100
   1 104
 164 108
  13 112
 132 116
 110 124
   1 128
 107 132
   3 136
  81 140
   1 144
  77 148
  40 156
  46 164
  29 172
  39 180
   2 184
  35 188
  31 196
  19 204
  16 212
  12 220
  10 228
   3 244
   1 304
   1 400
   1 1120

The spikes at 12/20/28 bytes correspond to SPNodes with 1/2/3 data
items.

BTW, on a 64-bit machine we're really paying through the nose for the
pointers in SPNodeData --- the pointers are bad enough, and their
alignment effects are worse.  If we were to try to change this over to
a pointer-free representation, we could probably replace those pointers
by 32-bit offsets, which would save a full factor of 2 on 64-bit.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch: tsearch - some memory diet

2010-10-06 Thread Robert Haas

On Wed, Oct 6, 2010 at 7:36 PM, Tom Lane  wrote:
> Teodor Sigaev  writes:
>>> on 32bit from 27MB (3399 blocks) to 13MB (1564 blocks)
>>> on 64bit from 55MB to cca 27MB.
>
>> Good results. But, I think, there are more places in ispell to use 
>> hold_memory():
>> - affixes and affix tree
>> - regis (REGex for ISpell, regis.c)
>
> I fixed the affix stuff as much as possible (some of the structures are
> re-palloc'd so they can't easily be included).  It appears that hacking
> up regis, or any of the remaining allocations, wouldn't be worth the
> trouble.  Using the Czech dictionary on a 32-bit machine, I see about
> 16MB going through the compacted-alloc code and only about 375K going
> through regular small palloc's.

Nice.  What was the overall effect on memory consumption?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch: tsearch - some memory diet

2010-10-06 Thread Tom Lane

Teodor Sigaev  writes:
>> on 32bit from 27MB (3399 blocks) to 13MB (1564 blocks)
>> on 64bit from 55MB to cca 27MB.

> Good results. But, I think, there are more places in ispell to use 
> hold_memory():
> - affixes and affix tree
> - regis (REGex for ISpell, regis.c)

I fixed the affix stuff as much as possible (some of the structures are
re-palloc'd so they can't easily be included).  It appears that hacking
up regis, or any of the remaining allocations, wouldn't be worth the
trouble.  Using the Czech dictionary on a 32-bit machine, I see about
16MB going through the compacted-alloc code and only about 375K going
through regular small palloc's.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch: tsearch - some memory diet

2010-10-06 Thread Tom Lane

Pavel Stehule  writes:
> this simple patch reduce a persistent allocated memory for tsearch
> ispell dictionaries.

> on 32bit from 27MB (3399 blocks) to 13MB (1564 blocks)
> on 64bit from 55MB to cca 27MB.

Applied with revisions --- I got rid of the risky static state as per
discussion, and extended the hackery to strings and Aff nodes as
suggested by Teodor.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] security hook on table creation

2010-10-06 Thread Alvaro Herrera

Excerpts from Robert Haas's message of mié oct 06 17:02:22 -0400 2010:
> 2010/10/5 KaiGai Kohei :

> > However, we also have a few headache cases.
> > DefineType() creates a new type object and its array type, but it does not
> > call CommandCounterIncrement() by the end of this function, so the new type
> > entries are not visible from the plugin modules, even if we put a security
> > hook at tail of the DefineType().
> > DefineFunction() also has same matter. It create a new procedure object,
> > but it also does not call CommandCounterIncrement() by the end of this
> > function, except for the case when ProcedureCreate() invokes language
> > validator function.
> 
> So I guess the first question here is why it's important to be able to
> see the new entry.  I am thinking that you want it so that, for
> example, you can fetch the namespace OID to perform an SE-Linux type
> transition.  Is that right?

I'm not sure that there's any point trying to optimize these to the
point of avoiding CommandCounterIncrement.  Surely DefineType et al are
not performance-sensitive operations.

> Maybe we need a variant of InvokeObjectAccessHook that does a CCI only
> if a hook is present.

The problem I see with this idea is that it becomes a lot harder to
track down whether it ocurred or not for any given operation.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] security hook on table creation

2010-10-06 Thread Robert Haas

2010/10/5 KaiGai Kohei :
> I began to revise the security hooks, but I could find a few cases that does
> not work with the approach of post-creation security hooks.
> If we try to fix up the core PG routine to become suitable for the 
> post-creation
> security hooks, it shall need to put more CommandCounterIncrement() on various
> places, so it made me doubtful whether this approach gives really minimum 
> impact
> to the core PG routine, or not.

We definitely don't want to add CCIs without a good reason.

> Some of object classes have CommandCounterIncrement() just after the routine
> to create a new object. For example, DefineRelation() calls it just after the
> heap_create_with_catalog(), so the new relation entry is visible for plugin
> modules using SearchSysCache(), as long as we put the post-creation security
> hook aftere the CommandCounterIncrement().
>
> However, we also have a few headache cases.
> DefineType() creates a new type object and its array type, but it does not
> call CommandCounterIncrement() by the end of this function, so the new type
> entries are not visible from the plugin modules, even if we put a security
> hook at tail of the DefineType().
> DefineFunction() also has same matter. It create a new procedure object,
> but it also does not call CommandCounterIncrement() by the end of this
> function, except for the case when ProcedureCreate() invokes language
> validator function.

So I guess the first question here is why it's important to be able to
see the new entry.  I am thinking that you want it so that, for
example, you can fetch the namespace OID to perform an SE-Linux type
transition.  Is that right?

Maybe we need a variant of InvokeObjectAccessHook that does a CCI only
if a hook is present.  I can't see that we're going to want to pay for
that unconditionally, but it shouldn't surprise anyone that loading an
enhanced security provider comes with some overhead.

> E.g, it may be possible to design creation of relation as follows:
>
> DefineRelation(...)
> {
>    /* DAC permission checks here */
>        :
>    /* MAC permission checks; it returns its private data */
>    opaque = check_relation_create(..);
>        :
>    /* insertion into pg_class catalog */
>    relationId = heap_create_with_catalog(...);
>        :
>    /* assign security labels on the new object */
>    InvokeObjectAccessHook(OBJECT_TABLE, OAT_POST_CREATE,
>                           relationId, 0, opaque);
> }

I'd like to try to avoid that, if we can.  That's going to make this
code far more complex to understand and maintain.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] host name support in pg_hba.conf

2010-10-06 Thread Greg Stark

On Wed, Oct 6, 2010 at 6:21 AM, Magnus Hagander  wrote:
> It's not common, but i've certainly come across a number of virtual
> machines where localhost resolves (through /etc/hosts) to the machines
> "real" IP rather than 127.0.01, because 127.0.0.1 simply doesn't
> exist.

It's perfectly fine for localhost to resolve to the machine's external
ip address. It would be weird for it to resolve to some other host's
ip address like the vm's host machine. But having 127.0.0.1 not exist
would be positively broken. All kinds of things wouldn't work. Are you
sure about that part?

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Issues with Quorum Commit

2010-10-06 Thread Simon Riggs

On Wed, 2010-10-06 at 18:04 +0300, Heikki Linnakangas wrote:

> The key is whether you are guaranteed to have zero data loss or not.

We agree that is an important question.

You seem willing to trade anything for that guarantee. I seek a more
pragmatic approach that balances availability and risk.

Those views are different, but not inconsistent. Oracle manages to offer
multiple options and so can we.

If you desire that, go for it. But don't try to stop others having a
simple, pragmatic approach. The code to implement your desired option is
more complex and really should come later. I don't in any way wish to
block that option in this release, or any other, but please don't try to
persuade people it's the only sensible option 'cos it damn well isn't.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Issues with Quorum Commit

2010-10-06 Thread Josh Berkus


> Seems reasonable, but what is a CAP database?

Database based around the CAP theorem[1].  Cassandra, Dynamo,
Hypertable, etc.

For us, the equation is: CAD, as in Consistency, Availability,
Durability.  Pick any two, at best.  But it's a very similar bag of
issues as the ones CAP addresses.

[1]http://www.julianbrowne.com/article/viewer/brewers-cap-theorem

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Issues with Quorum Commit

2010-10-06 Thread Markus Wanner

On 10/06/2010 09:04 PM, Dimitri Fontaine wrote:
> Ok so I think we're agreeing here: what I said amounts to propose that
> the code does work this way when the quorum is such setup, and/or is
> able to reject any non-read-only transaction (those that needs a real
> XID) until your standby is fully in sync.
> 
> I'm just saying that this should be an option, not the only choice.

I'm sorry, I just don't see the use case for a mode that drops
guarantees when they are most needed. People who don't need those
guarantees should definitely go for async replication instead.

What does a synchronous replication mode that falls back to async upon
failure give you, except for a severe degradation in performance during
normal operation? Why not use async right away in such a case?

> Depends, lots of things out there work quite well in best effort mode,
> even if some projects needs more careful thinking. That's again the idea
> of waiting forever or just continuing, there's a middle-ground which is
> starting the system before reaching the durability requirements or
> downgrading it to read only, or even off, until you get them.

In such cases the admin should be free to reconfigure the quorum. And
yes, a read-only mode might be feasible. Please just don't fool the
admin with a "best effort" things that guarantees nothing (but trouble).

> If you ask for a quorum larger than what the current standbys are able
> to deliver, and you're set to wait forever until the quorum is reached,
> you just blocked the master.

Correct. That's the intended behavior.

> Good news is that the quorum is a per-transaction setting

I definitely like the per-transaction thing.

> so opening a
> superuser connection to act on the currently waiting transaction is
> still possible (pass/fail, but fail is what at this point? shutdown to
> wait some more offline?).

Not sure I'm following here. The admin will be busy re-establishing
(connections to) standbies, killing transactions on the master doesn't
help anything - whether or not the master waits forever.

Regards

Markus Wanner

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Issues with Quorum Commit

2010-10-06 Thread Heikki Linnakangas


On 06.10.2010 20:57, Josh Berkus wrote:

While it's nice to dismiss case (1) as an edge-case, consider the
likelyhood of someone running PostgreSQL with fsync=off on cloud
hosting.  In that case, having k = N = 5 does not seem like an
unreasonable arrangement if you want to ensure durability via
replication.  It's what the CAP databases do.


Seems reasonable, but what is a CAP database?

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] gist picksplit iteration

2010-10-06 Thread Dimitri Fontaine

Marios Vodas  writes:
> I would expect it to start from 0, since C arrays are 0 based.
> So my question is why does this happen?

Well I don't have any good answer other than "it's the API".

Time to have a look at some contrib code and some other, maybe, like
ip4r or prefix (the former is fixed size, the later a varlena struct,
pick a good example for you).

-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Issues with Quorum Commit

2010-10-06 Thread Dimitri Fontaine

Markus Wanner  writes:
> There's no point in time I
> ever mind if a standby is a "candidate" or not. Either I want to
> synchronously replicate to X standbies, or not.

Ok so I think we're agreeing here: what I said amounts to propose that
the code does work this way when the quorum is such setup, and/or is
able to reject any non-read-only transaction (those that needs a real
XID) until your standby is fully in sync.

I'm just saying that this should be an option, not the only choice.

And that by having a clear view of the system's state, it's possible to
have a clear error response policy set out.

> This is an admin decision. Whether or not your standbies are up and
> running or not, existing or just about to be bought, that doesn't have
> any impact on your durability requirements.

Depends, lots of things out there work quite well in best effort mode,
even if some projects needs more careful thinking. That's again the idea
of waiting forever or just continuing, there's a middle-ground which is
starting the system before reaching the durability requirements or
downgrading it to read only, or even off, until you get them.

You can read my proposal as a way to allow our users to choose between
those two incompatible behaviours.

> Of course, it doesn't make sense to wait-forever on *every* standby that
> ever gets added. Quorum commit is required, yes (and that's what this
> thread is about, IIRC). But with quorum commit, adding a standby only
> improves availability, but certainly doesn't block the master in any
> way. (Quite the opposite: it can allow the master to continue, if it has
> been blocked before because the quorum hasn't been reached).

If you ask for a quorum larger than what the current standbys are able
to deliver, and you're set to wait forever until the quorum is reached,
you just blocked the master.

Good news is that the quorum is a per-transaction setting, so opening a
superuser connection to act on the currently waiting transaction is
still possible (pass/fail, but fail is what at this point? shutdown to
wait some more offline?).

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Issues with Quorum Commit

2010-10-06 Thread Markus Wanner

Hello Dimitri,

On 10/06/2010 05:41 PM, Dimitri Fontaine wrote:
>  - when do you start considering the standby as a candidate to your sync
>rep requirements?

That question doesn't make much sense to me. There's no point in time I
ever mind if a standby is a "candidate" or not. Either I want to
synchronously replicate to X standbies, or not.

> Lots of the discussion we're having are taking as an implicit that the
> answer is "as soon as you know about its existence, that must be at the
> pg_start_backup() point".

This is an admin decision. Whether or not your standbies are up and
running or not, existing or just about to be bought, that doesn't have
any impact on your durability requirements. If you want your banking
accounts data to be saved in at least two different locations, I think
that's your requirement.

You'd be quite unhappy if your bank lost your last month's salary, but
stated: "hey, at least we didn't have any downtime".

> And you can offer an option to guarantee the wait-forever behavior only
> when it makes sense, rather than trying to catch your own tail as soon
> as a standby is added in the mix

Of course, it doesn't make sense to wait-forever on *every* standby that
ever gets added. Quorum commit is required, yes (and that's what this
thread is about, IIRC). But with quorum commit, adding a standby only
improves availability, but certainly doesn't block the master in any
way. (Quite the opposite: it can allow the master to continue, if it has
been blocked before because the quorum hasn't been reached).

Regards

Markus Wanner

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch: tsearch - some memory diet

2010-10-06 Thread Tom Lane

Robert Haas  writes:
> On Wed, Oct 6, 2010 at 1:40 PM, Tom Lane  wrote:
>> Robert Haas  writes:
>>> ...but I don't really see why that has to be done as part of this patch.
>> 
>> Because patches that reduce maintainability seldom get cleaned up after.

> I don't think you've made a convincing argument that this patch does
> that, but if you're feeling motivated to go clean this up, I'm more
> than happy to get out of the way.

Yeah, I'll take it.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Issues with Quorum Commit

2010-10-06 Thread Josh Berkus

All,

Let me clarify and consolidate this discussion.  Again, it's my goal
that this thread specifically identify only the problems and desired
behaviors for synch rep with more than one sync standby.  There are
several issues with even one sync standby which still remain unresolved,
but I believe that we should discuss those on a separate thread, for
clarity.

I also strongly believe that we should get single-standby functionality
committed and tested *first*, before working further on multi-standby.

So, to summarize earlier discussion on this thread:

There are 2 reasons to have more than one sync standby:

1) To increase durability above the level of a single synch standby,
even at the cost of availability.

2) To increase availability without decreasing durability below the
level offered by a single sync standby.

The "pure" setup for each of these options, where N is the number of
standbys and k is the number of acks required from standbys is:

1) k = N, N > 1, apply
2) k = 1, N > 1, recv

(Timeouts are a specific compromise of durability for availability on
*one* server, and as such will not be discussed here.  BTW, I was the
one who suggested a timeout, rather than Simon, so if you don't like the
idea, harass me about it.)

Any other configuration (3) than the two above is a specific compromise
between durability and availability, for example:

3a) k = 2, N = 3, fsync
3b) k = 3, N = 10, recv

... should give you better durability than case 2) and better
availability than case 1).

While it's nice to dismiss case (1) as an edge-case, consider the
likelyhood of someone running PostgreSQL with fsync=off on cloud
hosting.  In that case, having k = N = 5 does not seem like an
unreasonable arrangement if you want to ensure durability via
replication.  It's what the CAP databases do.

After eliminating some of my issues as non-issues, here's what we're
left with for problems on the above:

(1), (3) Accounting/Registration.  Implementing any of these cases would
seem to require some form of accounting and/or registration on the
master in terms of, at a minimum, the number of acks for each data send.
 More likely we will need, as proposed on other threads, a register of
standbys and the sync state of each.  Not only will this
accounting/registration be hard code to write, it will have at least
*some* performance overhead.   Whether that overhead is minority or
substantial can only be determined through testing.  Further, there's
the issue of whether, and how, we transmit this register to the standbys
so that they can be promoted.

(2), (3) Degradation: (Jeff) these two cases make sense only if we give
DBAs the tools they need to monitor which standbys are falling behind,
and to drop and replace those standbys.  Otherwise we risk giving DBAs
false confidence that they have better-than-1-standby reliability when
actually they don't.  Current tools are not really adequate for this.

(1), (3) Dynamic Re-configuration: we need the ability to add and remove
standbys at runtime.  We also need to have a verdict on how to handle
the case where a transaction is pending, per Heikki.

(2), (3) Promotion: all multi-standby high-availability cases only make
sense if we provide tools to promote the most current standby to be the
new master.  Otherwise the whole cluster still goes down whenever we
have to replace the master.  We also should provide some mechanism for
promoting an async standby to sync; this has already been discussed.

(1) Consistency: this is another DBA-false-confidence issue.  DBAs who
implement (1) are liable to do so thinking that they are not only
guaranteeing the consistency of every standby with the master, but the
consistency of every standby with every other standby -- a kind of dummy
multi-master.  They are not, so it will take multiple reminders and
workarounds in the docs to explain this.  And we'll get complaints anyway.

(1), (2), (3) Initialization: (Dimitri) we need a process whereby a
standby can go from cloned to synched to being a sync rep standby, and
possibly from degraded to synced again and back.


-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch: tsearch - some memory diet

2010-10-06 Thread Robert Haas

On Wed, Oct 6, 2010 at 1:40 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> I think it would be cleaner to get rid of checkTmpCtx() and instead
>> have dispell_init() set up and tear down the temporary context,
>
> What I was thinking of doing was getting rid of the static variable
> altogether: we should do what you say above, but in the form of a
> state struct that's created and destroyed by additional calls from
> dispell_init().  Then that state struct could also carry the
> infrastructure for this additional hack.  It's a little more notation to
> pass an additional parameter through all these routines, but from the
> standpoint of understandability and maintainability it's clearly worth
> it.
>
>> void NISetupForDictionaryLoad();
>> void NICleanupAfterDictionaryLoad();
>
> More like
>
>        NISpellState *NISpellInit();
>        NISpellTerm(NISpellState *stat);
>
>> ...but I don't really see why that has to be done as part of this patch.
>
> Because patches that reduce maintainability seldom get cleaned up after.

I don't think you've made a convincing argument that this patch does
that, but if you're feeling motivated to go clean this up, I'm more
than happy to get out of the way.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch: tsearch - some memory diet

2010-10-06 Thread Tom Lane

Robert Haas  writes:
> I think it would be cleaner to get rid of checkTmpCtx() and instead
> have dispell_init() set up and tear down the temporary context,

What I was thinking of doing was getting rid of the static variable
altogether: we should do what you say above, but in the form of a
state struct that's created and destroyed by additional calls from
dispell_init().  Then that state struct could also carry the
infrastructure for this additional hack.  It's a little more notation to
pass an additional parameter through all these routines, but from the
standpoint of understandability and maintainability it's clearly worth
it.

> void NISetupForDictionaryLoad();
> void NICleanupAfterDictionaryLoad();

More like

NISpellState *NISpellInit();
NISpellTerm(NISpellState *stat);

> ...but I don't really see why that has to be done as part of this patch.

Because patches that reduce maintainability seldom get cleaned up after.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Bug / shortcoming in has_*_privilege

2010-10-06 Thread Alvaro Herrera

Excerpts from KaiGai Kohei's message of mar oct 05 00:06:05 -0400 2010:
> (2010/09/07 6:16), Alvaro Herrera wrote:
> > Excerpts from Jim Nasby's message of jue jun 10 17:54:43 -0400 2010:
> >> test...@workbook=# select has_table_privilege( 'public', 'test', 'SELECT' 
> >> );
> >> ERROR:  role "public" does not exist
> > 
> > Here's a patch implementing this idea.
> > 
> I checked this patch.

Thanks.

> It seems to me it replaces whole of get_role_oid() in has_*_privilege
> functions by the new get_role_oid_or_public(), so this patch allows
> to accept the pseudo "public" user in consistent way.

Yes.

> The pg_has_role_*() functions are exception. It will raise an error
> with error message of "role "public" does not exist".
> Is it an expected bahavior, isn't it?

Yes.  You cannot grant "public" to roles; according to the definition of
public, this doesn't make sense.  Accordingly, I chose to reject
"public" as an input for pg_has_role and friends.

> > Another thing that could raise eyebrows is that I chose to remove the
> > "missing_ok" argument from get_role_oid_or_public, so it's not a perfect
> > mirror of it.  None of the current callers need it, but perhaps people
> > would like these functions to be consistent.
> > 
> Tom Lane suggested to add missing_ok argument, although it is not a must-
> requirement.

Actually I think he suggested the opposite.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] standby registration (was: is sync rep stalled?)

2010-10-06 Thread Robert Haas

On Wed, Oct 6, 2010 at 12:26 PM, Greg Smith  wrote:
> Now, the more relevant question, what I actually need in order for a Sync
> Rep feature in 9.1 to be useful to the people who want it most I talk to.
>  That would be a simple to configure setup where I list a subset of
> "important" nodes, and the appropriate acknowledgement level I want to hear
> from one of them.  And when one of those nodes gives that acknowledgement,
> commit on the master happens too.  That's it.  For use cases like the
> commonly discussed "two local/two remote" situation, the two remote ones
> would be listed as the important ones.

That sounds fine to me.  How do the details work?  Each slave
publishes a name to the master via a recovery.conf parameter, and the
master has a GUC listing the names of the important slaves?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch: tsearch - some memory diet

2010-10-06 Thread Robert Haas

On Mon, Oct 4, 2010 at 2:05 AM, Pavel Stehule  wrote:
> 2010/10/4 Robert Haas :
>> On Oct 3, 2010, at 7:02 PM, Tom Lane  wrote:
>>> It's not at all apparent that the code is even
>>> safe as-is, because it's depending on the unstated assumption that that
>>> static variable will get reset once per dictionary.  The documentation
>>> is horrible: it doesn't really explain what the patch is doing, and what
>>> it does say is wrong.
>>
>> Yep. We certainly would need to convince ourselves that this is correct 
>> before applying it, and that is all kinds of non-obvious.
>>
>
> what is good documentation?
>
> This patch doesn't do more, than it removes palloc overhead on just
> one structure of TSearch2 ispell dictionary. It isn't related to some
> static variable - the most important is fact so this memory is
> unallocated by dropping of memory context.

After looking at this a bit more, I don't think it's too hard to fix
up the comments so that they reflect what's actually going on here.
For this patch to be correct, the only thing we really need to believe
is that no one is going to try to pfree an SPNode, which seems like
something we ought to be able to convince ourselves of.  I don't see
how the fact that some of the memory may get allocated out of a
palloc'd chunk from context X rather than from context X directly can
really cause any problems otherwise.  The existing code already
depends on the unstated assumption that the static variable will get
reset once per dictionary; we're not making that any worse.

I think it would be cleaner to get rid of checkTmpCtx() and instead
have dispell_init() set up and tear down the temporary context,
leaving NULL behind in the global variable after it's torn down,
perhaps by having spell.c publish an API like this:

void NISetupForDictionaryLoad();
void NICleanupAfterDictionaryLoad();

...but I don't really see why that has to be done as part of this patch.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] standby registration (was: is sync rep stalled?)

2010-10-06 Thread Greg Smith


Josh Berkus wrote:
However, I think we're getting way the heck away from how far we 
really want to go for 9.1.  Can I point out to people that synch rep 
is going to involve a fair bit of testing and debugging, and that 
maybe we don't want to try to implement The World's Most Configurable 
Standby Spec as a first step?


I came up with the following initial spec for Most Configurable Standby 
Setup Ever recently:


-The state of all available standby systems is exposed via a table-like 
interface, probably an SRF.
-As each standby reports back a result, its entry in the table is 
updated with what level of commit it has accomplished (recv, fsync, etc.)
-The table-like list of standby states is then passed to a function, 
that you could write in SQL or whatever else makes you happy.  The 
function returns a boolean for whether sufficient commit guarantees have 
been met yet.  You can make the conditions required as complicated as 
you like.
-Once that function returns true, commit on the master.  Otherwise 
return to waiting for standby responses.


So that's what I actually want here, because all subsets of it proposed 
so are way too boring.  If you cannot express every possible standby 
situation that anyone will ever think of via an arbitrary function hook, 
obviously it's not worth building at all.


Now, the more relevant question, what I actually need in order for a 
Sync Rep feature in 9.1 to be useful to the people who want it most I 
talk to.  That would be a simple to configure setup where I list a 
subset of "important" nodes, and the appropriate acknowledgement level I 
want to hear from one of them.  And when one of those nodes gives that 
acknowledgement, commit on the master happens too.  That's it.  For use 
cases like the commonly discussed "two local/two remote" situation, the 
two remote ones would be listed as the important ones.


Until something that simple is committed, tested, debugged, and had some 
run-ins with the real world, I have minimal faith that an attempt to 
anything more complicated has sufficient information to succeed.  And 
complete faith that even trying will fail to deliver something for 9.1.  
The scope creep that seems to be happening here in the name of "this 
will be hard to change so it must be right in the first version" boggles 
my mind.


--
Greg Smith, 2ndQuadrant US g...@2ndquadrant.com Baltimore, MD
PostgreSQL Training, Services and Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Issues with Quorum Commit

2010-10-06 Thread Dimitri Fontaine

Heikki Linnakangas  writes:
> I'm sorry, but I still don't understand the use case you're envisioning. How
> many standbys are there? What are you trying to achieve with synchronous
> replication over what asynchronous offers?

Sorry if I've been unclear, I read loads of message then tried to pick
up the right one to answer, and obviously missed to spell out some
context.

My concern starts with only 1 standby, and is in fact 2 questions:

 - Why o why you wouldn't be able to fix your sync setup in the master
   as soon as there's a standby doing a base backup?

 - when do you start considering the standby as a candidate to your sync
   rep requirements?

Lots of the discussion we're having are taking as an implicit that the
answer is "as soon as you know about its existence, that must be at the
pg_start_backup() point". I claim that's incorrect, and you can't ask
the master to wait forever until the standby is in sync. All the more
because there's a window with wal_keep_segments here too, so the sync
might never happen.

To solve that problem, I propose managing current state of the
standby. 

  That means auto registration of any standby, and feedback loop at more
  stages, and some protocol arbitrage for the standby to be able to say
  "I'm this far actually" so that the master can know how to consider
  it, rather than just demote it while live.

One you have a clear list of possible states for a standby, and can
decide on what errors are meaning in terms of transitions in the state
machine, you're able to decide when wait forever is an option and when
you should ignore it or refuse any side-effect transaction commit.

And you can offer an option to guarantee the wait-forever behavior only
when it makes sense, rather than trying to catch your own tail as soon
as a standby is added in the mix, with the proposals I've read on how
you can't even restart the master at this point.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Issues with Quorum Commit

2010-10-06 Thread Markus Wanner

On 10/06/2010 04:20 PM, Simon Riggs wrote:
> Ending the wait state does not cause data loss. It puts you at *risk* of
> data loss, which is a different thing entirely.

These kind of risk scenarios is what sync replication is all about. A
minimum guarantee that doesn't hold in face of the first few failures
(see Jeff's argument) isn't worth a dime.

Keep in mind that upon failure, the other nodes presumably get more
load. As has been seen with RAID, that easily leads to subsequent
failures. Sync rep needs to be able to protect against that *as well*.

> If you want to avoid data loss you use N+k redundancy and get on with
> life, rather than sitting around waiting.

With that notion, I'd argue that quorum_commit needs to be set to
exactly k, because any higher value would only cost performance without
any useful benefit.

But if I want at least k ACKs and if I think it's worth the performance
penalty that brings during normal operation, I want that guarantee to
hold true *especially* in case of an emergency.

If availability is more important, you need to increase N and make sure
enough of these (asynchronously) replicated nodes stay up. Increase k
(thus quorum commit) for a stronger durability guarantee.

> Putting in a feature for people that choose k=0 seems wasteful to me,
> since they knowingly put themselves at risk in the first place.

Given the above logic, k=0 equals to completely async replication. Not
sure what's wrong about that.

Regards

Markus Wanner

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Issues with Quorum Commit

2010-10-06 Thread Heikki Linnakangas


On 06.10.2010 18:02, Dimitri Fontaine wrote:

Heikki Linnakangas  writes:

   1. base-backup  — self explaining
   2. catch-up — getting the WAL to catch up after base backup
   3. wanna-sync   — don't yet have all the WAL to get in sync
   4. do-sync  — all WALs are there, coming soon
   5. ok (async | recv | fsync | reply — feedback loop engaged)

So you only consider that a standby is a candidate for sync rep when
it's reached the ok state, and that's when it's able to fill the
feedback loop we've been talking about. Standby state != ok, no waiting
no nothing, it's *not* a standby as far as the master is concerned.


You're not going to get zero data loss that way. Can you elaborate what the
use case for that mode is?


You can't pretend to sync with zero data loss until the standby is ready
for it, or you need to take the site down while you add your standby.

I can see some user willing to take the site down while doing the base
backup dance then waiting for initial sync, then only accepting traffic
and being secure against data loss, but I'd much rather that be an
option and you could watch for your standby's state in a system view.

Meanwhile, I can't understand any reason for the master to pretend it
can safely manage any sync-rep transaction while there's no standby
around. Either you wait for the quorum and don't have it, or you have to
track standby states with precision and maybe actively reject writes.


I'm sorry, but I still don't understand the use case you're envisioning. 
How many standbys are there? What are you trying to achieve with 
synchronous replication over what asynchronous offers?


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Issues with Quorum Commit

2010-10-06 Thread Heikki Linnakangas


On 06.10.2010 17:20, Simon Riggs wrote:

On Wed, 2010-10-06 at 15:26 +0300, Heikki Linnakangas wrote:


You're not going to get zero data loss that way.


Ending the wait state does not cause data loss. It puts you at *risk* of
data loss, which is a different thing entirely.


Looking at it that way, asynchronous replication just puts you at risk 
of data loss too, it doesn't necessarily mean you get data loss.


The key is whether you are guaranteed to have zero data loss or not. If 
you don't wait forever, you're not guaranteed zero data loss. It's just 
best effort, like asynchronous replication. The situation you want to 
avoid is that the master dies, and you don't know if you have suffered 
data loss or not.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Issues with Quorum Commit

2010-10-06 Thread Dimitri Fontaine

Heikki Linnakangas  writes:
>>   1. base-backup  — self explaining
>>   2. catch-up — getting the WAL to catch up after base backup
>>   3. wanna-sync   — don't yet have all the WAL to get in sync
>>   4. do-sync  — all WALs are there, coming soon
>>   5. ok (async | recv | fsync | reply — feedback loop engaged)
>>
>> So you only consider that a standby is a candidate for sync rep when
>> it's reached the ok state, and that's when it's able to fill the
>> feedback loop we've been talking about. Standby state != ok, no waiting
>> no nothing, it's *not* a standby as far as the master is concerned.
>
> You're not going to get zero data loss that way. Can you elaborate what the
> use case for that mode is?

You can't pretend to sync with zero data loss until the standby is ready
for it, or you need to take the site down while you add your standby. 

I can see some user willing to take the site down while doing the base
backup dance then waiting for initial sync, then only accepting traffic
and being secure against data loss, but I'd much rather that be an
option and you could watch for your standby's state in a system view.

Meanwhile, I can't understand any reason for the master to pretend it
can safely manage any sync-rep transaction while there's no standby
around. Either you wait for the quorum and don't have it, or you have to
track standby states with precision and maybe actively reject writes.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Issues with Quorum Commit

2010-10-06 Thread Simon Riggs

On Wed, 2010-10-06 at 15:26 +0300, Heikki Linnakangas wrote:

> You're not going to get zero data loss that way.

Ending the wait state does not cause data loss. It puts you at *risk* of
data loss, which is a different thing entirely.

If you want to avoid data loss you use N+k redundancy and get on with
life, rather than sitting around waiting.

Putting in a feature for people that choose k=0 seems wasteful to me,
since they knowingly put themselves at risk in the first place.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] host name support in pg_hba.conf

2010-10-06 Thread Andrew Dunstan




On 10/06/2010 09:49 AM, Stephen Frost wrote:

* Tom Lane (t...@sss.pgh.pa.us) wrote:

That appears to me to be a broken (non RFC compliant) VM setup.
However, maybe what this is telling us is we need to expose the setting?
Or perhaps better, try 127.0.0.1, ::1, localhost, in that order.

Yeah, I'd be happier if we exposed it, to be honest.  Either that, or
figure out a way to get rid of it entirely by using a different method,
but that's a much bigger issue.


Please don't expose it. It will a source of yet more confusion. People 
already get confused by the difference between listening addresses and 
pg_hba.conf addresses. It's one of the most frequent points of confusion 
seen on IRC.  Adding another address to configure will just compound the 
confusion badly. I much prefer Tom's last suggestion.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] host name support in pg_hba.conf

2010-10-06 Thread Stephen Frost

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> That appears to me to be a broken (non RFC compliant) VM setup.
> However, maybe what this is telling us is we need to expose the setting?
> Or perhaps better, try 127.0.0.1, ::1, localhost, in that order.

Yeah, I'd be happier if we exposed it, to be honest.  Either that, or
figure out a way to get rid of it entirely by using a different method,
but that's a much bigger issue.

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] host name support in pg_hba.conf

2010-10-06 Thread Magnus Hagander

On Wed, Oct 6, 2010 at 15:34, Tom Lane  wrote:
> Magnus Hagander  writes:
>> On Wed, Oct 6, 2010 at 15:16, Tom Lane  wrote:
>>> However, the usage in pgstat.c is hard-wired, meaning that if you
>>> have a configuration where "localhost" doesn't resolve correctly
>>> for whatever reason, there's no simple recourse to get the stats
>>> collector working. So ISTM there is an argument for changing that.
>
>> Well, hardcoding it will break the (unusual) case when localhost isn't
>> 127.0.0.1 / ::1. (You'd obviously have to have it try both ipv4 and
>> ipv6).
>
> You didn't read what I wrote before.  Those numeric addresses define the
> loopback address, *not* "localhost".  When localhost fails to resolve
> as those address(es), it's localhost that is wrong.  We have actually
> seen this in the field with bogus DNS providers.
>
>> It's not common, but i've certainly come across a number of virtual
>> machines where localhost resolves (through /etc/hosts) to the machines
>> "real" IP rather than 127.0.01, because 127.0.0.1 simply doesn't
>> exist.
>
> That appears to me to be a broken (non RFC compliant) VM setup.

Can't argue with that. But it exists.


> However, maybe what this is telling us is we need to expose the setting?
> Or perhaps better, try 127.0.0.1, ::1, localhost, in that order.

That was kind of my point, that yes, we probably need to do one of
those at least. Today it is "kind of exposed", because you can edit
/etc/hosts - you don't need to rely on DNS for it. I just don't want
to lose that ability.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] host name support in pg_hba.conf

2010-10-06 Thread Tom Lane

Magnus Hagander  writes:
> On Wed, Oct 6, 2010 at 15:16, Tom Lane  wrote:
>> However, the usage in pgstat.c is hard-wired, meaning that if you
>> have a configuration where "localhost" doesn't resolve correctly
>> for whatever reason, there's no simple recourse to get the stats
>> collector working. So ISTM there is an argument for changing that.

> Well, hardcoding it will break the (unusual) case when localhost isn't
> 127.0.0.1 / ::1. (You'd obviously have to have it try both ipv4 and
> ipv6).

You didn't read what I wrote before.  Those numeric addresses define the
loopback address, *not* "localhost".  When localhost fails to resolve
as those address(es), it's localhost that is wrong.  We have actually
seen this in the field with bogus DNS providers.

> It's not common, but i've certainly come across a number of virtual
> machines where localhost resolves (through /etc/hosts) to the machines
> "real" IP rather than 127.0.01, because 127.0.0.1 simply doesn't
> exist.

That appears to me to be a broken (non RFC compliant) VM setup.
However, maybe what this is telling us is we need to expose the setting?
Or perhaps better, try 127.0.0.1, ::1, localhost, in that order.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] host name support in pg_hba.conf

2010-10-06 Thread Magnus Hagander

On Wed, Oct 6, 2010 at 15:16, Tom Lane  wrote:
> Andrew Dunstan  writes:
>> On 10/06/2010 04:05 AM, Peter Eisentraut wrote:
>>> On tis, 2010-10-05 at 22:17 -0400, Tom Lane wrote:
 So far as I can find, there is *no* standard
 mandating that localhost means the loopback address.
>
>>> Should we then change pgstat.c to use IP addresses instead of hardcoding
>>> "localhost"?
>
>> I understood Tom to be saying we should not rely on "localhost" for
>> authentication, not that we shouldn't use it at all.
>
> I think it's all right to use it as the default value for
> listen_addresses, because (1) it's an understandable default,
> and (2) users can change the setting if it doesn't work.
>
> However, the usage in pgstat.c is hard-wired, meaning that if you
> have a configuration where "localhost" doesn't resolve correctly
> for whatever reason, there's no simple recourse to get the stats
> collector working.  So ISTM there is an argument for changing that.

Well, hardcoding it will break the (unusual) case when localhost isn't
127.0.0.1 / ::1. (You'd obviously have to have it try both ipv4 and
ipv6).

It's not common, but i've certainly come across a number of virtual
machines where localhost resolves (through /etc/hosts) to the machines
"real" IP rather than 127.0.01, because 127.0.0.1 simply doesn't
exist.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] host name support in pg_hba.conf

2010-10-06 Thread Tom Lane

Andrew Dunstan  writes:
> On 10/06/2010 04:05 AM, Peter Eisentraut wrote:
>> On tis, 2010-10-05 at 22:17 -0400, Tom Lane wrote:
>>> So far as I can find, there is *no* standard
>>> mandating that localhost means the loopback address.

>> Should we then change pgstat.c to use IP addresses instead of hardcoding
>> "localhost"?

> I understood Tom to be saying we should not rely on "localhost" for 
> authentication, not that we shouldn't use it at all.

I think it's all right to use it as the default value for
listen_addresses, because (1) it's an understandable default,
and (2) users can change the setting if it doesn't work.

However, the usage in pgstat.c is hard-wired, meaning that if you
have a configuration where "localhost" doesn't resolve correctly
for whatever reason, there's no simple recourse to get the stats
collector working.  So ISTM there is an argument for changing that.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] host name support in pg_hba.conf

2010-10-06 Thread Tom Lane

Peter Eisentraut  writes:
> On tis, 2010-10-05 at 22:17 -0400, Tom Lane wrote:
>> So far as I can find, there is *no* standard
>> mandating that localhost means the loopback address.

> Should we then change pgstat.c to use IP addresses instead of hardcoding
> "localhost"?

Hm, perhaps so.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Issues with Quorum Commit

2010-10-06 Thread Heikki Linnakangas


On 06.10.2010 15:22, Dimitri Fontaine wrote:

What is necessary here is a clear view on the possible states that a
standby can be in at any time, and we must stop trying to apply to
some non-ready standby the behavior we want when it's already in-sync.

 From my experience operating londiste, those states would be:

  1. base-backup  — self explaining
  2. catch-up — getting the WAL to catch up after base backup
  3. wanna-sync   — don't yet have all the WAL to get in sync
  4. do-sync  — all WALs are there, coming soon
  5. ok (async | recv | fsync | reply — feedback loop engaged)

So you only consider that a standby is a candidate for sync rep when
it's reached the ok state, and that's when it's able to fill the
feedback loop we've been talking about. Standby state != ok, no waiting
no nothing, it's *not* a standby as far as the master is concerned.


You're not going to get zero data loss that way. Can you elaborate what 
the use case for that mode is?


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Issues with Quorum Commit

2010-10-06 Thread Dimitri Fontaine

Markus Wanner  writes:
> On 10/06/2010 04:31 AM, Simon Riggs wrote:
>> That situation would require two things
>> * First, you have set up async replication and you're not monitoring it
>> properly. Shame on you.
>
> The way I read it, Jeff is complaining about the timeout you propose
> that effectively turns sync into async replication in case of a failure.
>
> With a master that waits forever, the standby that's newly required for
> quorum certainly still needs its time to catch up. But it wouldn't live
> in danger of being "optimized away" for availability in case it cannot
> catch up within the given timeout. It's a tradeoff between availability
> and durability.

What is necessary here is a clear view on the possible states that a
standby can be in at any time, and we must stop trying to apply to
some non-ready standby the behavior we want when it's already in-sync.

From my experience operating londiste, those states would be:

 1. base-backup  — self explaining
 2. catch-up — getting the WAL to catch up after base backup
 3. wanna-sync   — don't yet have all the WAL to get in sync
 4. do-sync  — all WALs are there, coming soon
 5. ok (async | recv | fsync | reply — feedback loop engaged)

So you only consider that a standby is a candidate for sync rep when
it's reached the ok state, and that's when it's able to fill the
feedback loop we've been talking about. Standby state != ok, no waiting
no nothing, it's *not* a standby as far as the master is concerned.

The other states allow to manage accepting a new standby into an
existing setup, and to manage error failures. When we stop receiving the
feedback loop events, the master knows the slave ain't in the "ok" state
any more and can demote it to "wanna-sync", because it has to keep WALs
until the slave comes again.

If the standby is not back online and wal_keep_segments makes it so that
we can't keep its wal anymore, the state gets back to "base-backup".

Not going into every details here (for example, we might need some
protocol arbitrage for the standby to be able to explain the master that
it's ok even if the master thinks it's not), but my point is that
without a clear list of standby states, we're going to hinder the master
in situations where it makes no sense to do so.

> Or do you envision any use case that requires a quorum of X standbies
> for normal operation but is just fine with only none to (X-1) standbies
> in case of failures? IMO that's when sync replication is most needed and
> when it absolutely should hold to its promises - even if it means to
> stop the system.
>
> There's no point in continuing operation if you cannot guarantee the
> minimum requirements for durability. If you happen to want such a thing,
> you should better rethink your minimum requirement (as performance for
> normal operations might benefit from a lower minimum as well).

This part of the discussion made me think of yet another refinement on
the Quorum Commit idea, even if I'm beginning to think that can be
material for later.

Basic Quorum Commit is having each transaction on the master wait for a
total number of votes to accept the transaction synced. Each standby has
a weight, meaning 1 or more votes. The problem is the flexibility isn't
there, some cases are impossible to setup. Also people want to be able
to specify their favorite standby and that's quickly uneasy.

Idea : segment the votes into "colors" or any categories you like. Have
each standby be a member of a category list, and require per-category
quorums to be reached. This is the same as attributing roles to standbys
and saying that they're all equivalent as soon as part of the given
role, with the added flexibility that you can sometime want more than
one standby of a given role to take part of the quorum.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] host name support in pg_hba.conf

2010-10-06 Thread Andrew Dunstan




On 10/06/2010 04:05 AM, Peter Eisentraut wrote:

On tis, 2010-10-05 at 22:17 -0400, Tom Lane wrote:

So far as I can find, there is *no* standard
mandating that localhost means the loopback address.

Should we then change pgstat.c to use IP addresses instead of hardcoding
"localhost"?



I understood Tom to be saying we should not rely on "localhost" for 
authentication, not that we shouldn't use it at all.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Issues with Quorum Commit

2010-10-06 Thread Heikki Linnakangas


On 06.10.2010 13:41, Magnus Hagander wrote:

That's only for a narrow definition of availability. For a lot of
people, having access to your data isn't considered availability
unless you can trust the data...


Ok, fair enough. For that, synchronous replication in the "wait forever" 
mode is the only alternative. That on its own doesn't give you any boost 
in availability, on the contrary, but coupled with suitable clustering 
tools to handle failover and deciding when the standby is dead, you can 
achieve that.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Issues with Quorum Commit

2010-10-06 Thread Magnus Hagander

On Wed, Oct 6, 2010 at 10:17, Heikki Linnakangas
 wrote:
> On 06.10.2010 11:09, Fujii Masao wrote:
>>
>> On Wed, Oct 6, 2010 at 3:31 PM, Heikki Linnakangas
>>   wrote:
>>>
>>> No. Synchronous replication does not help with availability. It allows
>>> you
>>> to achieve zero data loss, ie. if the master dies, you are guaranteed
>>> that
>>> any transaction that was acknowledged as committed, is still committed.
>>
>> Hmm.. but we can increase availability without any data loss by using
>> synchronous
>> replication. Many people have already been using synchronous
>> replication softwares
>> such as DRBD for that purpose.
>
> Sure, but it's not the synchronous aspect that increases availability. It's
> the replication aspect, and we already have that. Making the replication
> synchronous allows zero data loss in case the master suddenly dies, but it
> comes at the cost of availability.

That's only for a narrow definition of availability. For a lot of
people, having access to your data isn't considered availability
unless you can trust the data...

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] gincostestimate

2010-10-06 Thread Itagaki Takahiro

On Wed, Sep 8, 2010 at 1:02 AM, Teodor Sigaev  wrote:
> Fixed, and slightly reworked to be more clear.
> Attached patch is based on your patch.

The patch will improve accuracy of plans using gin indexes.
It only adds block-level statistics information into the meta
pages in gin indexes. Data-level statistics are not collected
by the patch, and there are no changes in pg_statistic.

The stats page is updated only in VACUUM. ANALYZE doesn't update
the information at all. In addition, REINDEX, VACUUM FULL, and
CLUSTER reset the information to zero, but the reset is not preferable.
Is it possible to fill the statistic fields at bulk index-build?
No one wants to run VACUUM after VACUUM FULL to update the GIN stats.

We don't have any methods to dump the meta information at all.
They might be internal information, but some developers and
debuggers might want such kinds of tools. Contrib/pageinspect
might be a good location to have such function; it has bt_metap().

The patch can be applied cleanly, no compiler warnings, and it passed
all existing regression tests. There are no additional documentation
and regression tests -- I'm not sure whether we should have them.
If the patch is an internal improvement, docs are not needed.

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] is sync rep stalled?

2010-10-06 Thread Dimitri Fontaine

Tom Lane  writes:
> I think the point here is that it's possible to have sync-rep
> configurations in which it's impossible to take a base backup.

Sorry to be slow. I still don't understand that problem.

I can understand why people want "wait forever", but I can't understand
when the following strange idea apply: consider my non-ready standby
there as a full member of the distributed setup already.

I've been making plenty of noise about this topic in the past, at the
beginning of plans for SR in 9.0 IIRC, pushing Heikki into having a
worked out state machine to figure out what are the known states of a
standby and what we can do with each. We've cancelled that and said it
would maybe necessary for Synchronous Replication. Here we go, right?

So, first thing first, when is it a good idea to consider a standby
that's not yet had its base backup, let alone validated that after
taking it the master still has enough WAL for the backup to be valid as
far as initialising the slave goes, to consider this broken standby as
someone we wait forever on?

I say a standby is registered when it's currently "attached" and already
able to keep up in async. That's a time when you can slow down the
master until this new member catches up to full sync or whatever you've
setup.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

Lack of google and archives-fu today means no link to those mails. Yet…

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Sync Rep at Oct 5

2010-10-06 Thread Simon Riggs

On Tue, 2010-10-05 at 11:30 -0400, Steve Singer wrote:

> Also on the topic of failover how do we want to deal with the master 
> failing over.   Say M->{S1,S2} and M fails and we promote S1 to M1.  Can 
> M1->S2? What if S2 was further along in processing than S1 when M 
> failed?  I don't think we want to take on this complexity for 9.1 but 
> this means that after M fails you won't have a synchronous replica until 
> you rebuild or somehow reset S2.

Those are problems that can be resolved, but that is the current state.
The trick, I guess, is to promote the correct standby.

Those are generic issues, not related to any specific patch. Thanks for
keeping those issues in the limelight.

> > == Path Minimization ==
> >
> > We want to be able to minimize and control the path of data transfer,
> > * so that the current master doesn't have initiate transfer to all
> > dependent nodes, thereby reducing overhead on master
> > * so that if the path from current master to descendent is expensive we
> > would minimize network costs.
> >
> > This requirement is commonly known as "relaying".
> >
> > In its most simply stated form, we want one standby to be able to get
> > WAL data from another standby. e.g. M ->  S ->  S. Stating the problem in
> > that way misses out on the actual requirement, since people would like
> > the arrangement to be robust in case of failures of M or any S. If we
> > specify the exact arrangement of paths then we need to respecify the
> > arrangement of paths if a server goes down.
> 
> Are we going to allow these paths to be reconfigured on a live cluster? 
> If we have M->S1->S2 and we want to reconfigure S2 to read from M then 
> S2 needs to get the data that has already been committed on S1 from 
> somewhere (either S1 or M).  This has solutions but it adds to the 
> complexity.  Maybe not for 9.1

If you switch from M -> S1 -> S2 to M -> (S1, S2) it should work fine.
At the moment that needs a shutdown/restart, but that could easily be
done with a disconnect/reconnect following a file reload.

The problem is how much WAL is stored on (any) node. Currently that is
wal_keep_segments, which doesn't work very well, but I've seen no better
ideas that cover all important cases.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Development, 24x7 Support, Training and Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Issues with Quorum Commit

2010-10-06 Thread Markus Wanner

On 10/06/2010 10:53 AM, Heikki Linnakangas wrote:
> Wow, that is really short. Are you sure? I have no first hand experience
> with DRBD,

Neither do I.

> and reading that man page, I get the impression that the
> timeout us just for deciding that the TCP connection is dead. There is
> also the ko-count parameter, which defaults to zero. I would guess that
> ko-count=0 is "wait forever", while ko-count=1 is what you described,
> but I'm not sure.

Yeah, sounds more likely. Then I'm surprised that I didn't find any
warning that the Protocol C definitely reduces availability (with the
ko-count=0 default, that is). Instead, they only state that it's the
most used replication mode, which really makes me wonder. [1]

Sorry for adding confusion by not researching properly.

Regards

Markus Wanner

[1] DRDB Repliaction Modes
http://www.drbd.org/users-guide-emb/s-replication-protocols.html

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Issues with Quorum Commit

2010-10-06 Thread Heikki Linnakangas


On 06.10.2010 11:49, Fujii Masao wrote:

On Wed, Oct 6, 2010 at 5:17 PM, Heikki Linnakangas
  wrote:

Sure, but it's not the synchronous aspect that increases availability. It's
the replication aspect, and we already have that. Making the replication
synchronous allows zero data loss in case the master suddenly dies, but it
comes at the cost of availability.


Yep. But I mean that the synchronous aspect is helpful to increase the
availability of the system which requires no data loss. In asynchronous
replication, when the master goes down, we have to salvage the missing
WAL for the standby from the failed master to avoid data loss. This would
take very long and decrease the availability of the system which doesn't
accept any data loss. Since the synchronous doesn't require such a salvage,
it can increase the availability of such a system.


In general, salvaging the WAL that was not sent to the standby yet is 
outright impossible. You can't achieve zero data loss with asynchronous 
replication at all.



If we want only no data loss, we have only to implement the wait-forever
option. But if we make consideration for the above-mentioned availability,
the return-immediately option also would be required.

In some (many, I think) cases, I think that we need to consider availability
and no data loss together, and consider the balance of them.


If you need both, you need three servers as Simon pointed out earlier. 
There is no way around that.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Issues with Quorum Commit

2010-10-06 Thread Heikki Linnakangas


On 06.10.2010 11:39, Markus Wanner wrote:

On 10/06/2010 10:17 AM, Heikki Linnakangas wrote:

On 06.10.2010 11:09, Fujii Masao wrote:

Hmm.. but we can increase availability without any data loss by using
synchronous
replication. Many people have already been using synchronous
replication softwares
such as DRBD for that purpose.


Sure, but it's not the synchronous aspect that increases availability.
It's the replication aspect, and we already have that.


..the *asynchronous* replication aspect, yes.

The drdb.conf man page [1] describes parameters of DRDB. It's worth
noting that even in "Protocol C" (synchronous mode), they sport a
timeout of only 6 seconds (by default).


Wow, that is really short. Are you sure? I have no first hand experience 
with DRBD, and reading that man page, I get the impression that the 
timeout us just for deciding that the TCP connection is dead. There is 
also the ko-count parameter, which defaults to zero. I would guess that 
ko-count=0 is "wait forever", while ko-count=1 is what you described, 
but I'm not sure.


It's not hard to imagine the master failing in a way that first causes 
the connection to standby to drop, and the disk failing 6 seconds later. 
A fire that destroys the network cable first and then spreads to the 
disk array for example.



[1]: drdb.conf man page:
http://www.drbd.org/users-guide/re-drbdconf.html


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Issues with Quorum Commit

2010-10-06 Thread Fujii Masao

On Wed, Oct 6, 2010 at 5:17 PM, Heikki Linnakangas
 wrote:
> On 06.10.2010 11:09, Fujii Masao wrote:
>>
>> On Wed, Oct 6, 2010 at 3:31 PM, Heikki Linnakangas
>>   wrote:
>>>
>>> No. Synchronous replication does not help with availability. It allows
>>> you
>>> to achieve zero data loss, ie. if the master dies, you are guaranteed
>>> that
>>> any transaction that was acknowledged as committed, is still committed.
>>
>> Hmm.. but we can increase availability without any data loss by using
>> synchronous
>> replication. Many people have already been using synchronous
>> replication softwares
>> such as DRBD for that purpose.
>
> Sure, but it's not the synchronous aspect that increases availability. It's
> the replication aspect, and we already have that. Making the replication
> synchronous allows zero data loss in case the master suddenly dies, but it
> comes at the cost of availability.

Yep. But I mean that the synchronous aspect is helpful to increase the
availability of the system which requires no data loss. In asynchronous
replication, when the master goes down, we have to salvage the missing
WAL for the standby from the failed master to avoid data loss. This would
take very long and decrease the availability of the system which doesn't
accept any data loss. Since the synchronous doesn't require such a salvage,
it can increase the availability of such a system.

If we want only no data loss, we have only to implement the wait-forever
option. But if we make consideration for the above-mentioned availability,
the return-immediately option also would be required.

In some (many, I think) cases, I think that we need to consider availability
and no data loss together, and consider the balance of them.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Issues with Quorum Commit

2010-10-06 Thread Markus Wanner

On 10/06/2010 10:17 AM, Heikki Linnakangas wrote:
> On 06.10.2010 11:09, Fujii Masao wrote:
>> Hmm.. but we can increase availability without any data loss by using
>> synchronous
>> replication. Many people have already been using synchronous
>> replication softwares
>> such as DRBD for that purpose.
> 
> Sure, but it's not the synchronous aspect that increases availability.
> It's the replication aspect, and we already have that.

..the *asynchronous* replication aspect, yes.

The drdb.conf man page [1] describes parameters of DRDB. It's worth
noting that even in "Protocol C" (synchronous mode), they sport a
timeout of only 6 seconds (by default).

After that, the primary node proceeds without any kind of guarantee
(which can be thought of as switching to async replication). Just as
Simon proposes for Postgres as well.

Maybe that really is enough for now. Everybody that needs stricter
durability guarantees needs to wait for Postgres-R ;-)

Regards

Markus Wanner

[1]: drdb.conf man page:
http://www.drbd.org/users-guide/re-drbdconf.html

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Triggers on VIEWs

2010-10-06 Thread Dean Rasheed

On 5 October 2010 21:17, Bernd Helmle  wrote:
> Basic summary of this patch:
>

Thanks for the review.

> * The patch includes a fairly complete discussion about INSTEAD OF triggers
> and their usage on views. There are also additional enhancements to the RULE
> documentation, which seems, given that this might supersede the usage of
> RULES for updatable views, reasonable.
>
> * The patch passes regressions tests and comes with a bunch of its own
> regression tests. I think they are complete, they cover statement and row
> Level trigger and show the usage for JOINed views for example.
>
> * I've checked against a draft of the SQL standard, the behavior of the
> patch seems to match the spec (my copy might be out of date, however).
>
> * The code looks pretty good to me, there are some low level error messages
> exposing some implementation details, which could be changed (e.g. "wholerow
> is NULL"), but given that this is most of the time unexpected and is used in
> some older code as well, this doesn't seem very important.
>

Hopefully that error should never happen, since it would indicate a
bug in the code rather than a user error.

> * The implementation introduces the notion of "wholerow". This is a junk
> target list entry which allows the executor to carry the view information to
> an INSTEAD OF trigger. In case of DELETE/UPDATE, the rewriter is responsible
> to inject the new "wholerow" TLE and make the original query to point to a
> new range table entry (correct me, when i'm wrong), which is based on the
> view's query. I'm not sure i'm happy with the notion of "wholerow" here,
> maybe "viewrow" or "viewtarget" is more descriptive?
>

That's a good description of how it works. I chose "wholerow" because
that matched similar terminology used already, for example in
preptlist.c when doing FOR UPDATE/SHARE on a view. I don't feel
strongly about it, but my inclination is to stick with "wholerow"
unless someone feels strongly otherwise.

> * I'm inclined to say that INSTEAD OF triggers have less overhead than
> RULES, but this is not proven yet with a reasonable benchmark.
>

It's difficult to come up with a general statement on performance
because there are so many variables that might affect it. In a few
simple tests, I found that for repeated small updates RULEs and
TRIGGERs perform roughly the same, but for bulk updates (one query
updating 1000s of rows) a RULE is best.

> I would like to do some more tests/review, but going to mark this patch as
> "Ready for Committer", so that someone more qualified on the executor part
> can have a look on it during this commitfest, if that's okay for us?
>

Thanks for looking at it. I hope this is useful functionality to make
it easier to write updatable views, and perhaps it will help with
implementing auto-updatable views too.

Cheers,
Dean


> --
> Thanks
>
>        Bernd
>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Issues with Quorum Commit

2010-10-06 Thread Heikki Linnakangas


On 06.10.2010 11:09, Fujii Masao wrote:

On Wed, Oct 6, 2010 at 3:31 PM, Heikki Linnakangas
  wrote:

No. Synchronous replication does not help with availability. It allows you
to achieve zero data loss, ie. if the master dies, you are guaranteed that
any transaction that was acknowledged as committed, is still committed.


Hmm.. but we can increase availability without any data loss by using
synchronous
replication. Many people have already been using synchronous
replication softwares
such as DRBD for that purpose.


Sure, but it's not the synchronous aspect that increases availability. 
It's the replication aspect, and we already have that. Making the 
replication synchronous allows zero data loss in case the master 
suddenly dies, but it comes at the cost of availability.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Issues with Quorum Commit

2010-10-06 Thread Fujii Masao

On Wed, Oct 6, 2010 at 3:31 PM, Heikki Linnakangas
 wrote:
> No. Synchronous replication does not help with availability. It allows you
> to achieve zero data loss, ie. if the master dies, you are guaranteed that
> any transaction that was acknowledged as committed, is still committed.

Hmm.. but we can increase availability without any data loss by using
synchronous
replication. Many people have already been using synchronous
replication softwares
such as DRBD for that purpose.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Issues with Quorum Commit

2010-10-06 Thread Markus Wanner

On 10/06/2010 08:31 AM, Heikki Linnakangas wrote:
> On 06.10.2010 01:14, Josh Berkus wrote:
>> Last I checked, our goal with synch standby was to increase availablity,
>> not decrease it.
> 
> No. Synchronous replication does not help with availability. It allows
> you to achieve zero data loss, ie. if the master dies, you are
> guaranteed that any transaction that was acknowledged as committed, is
> still committed.

Strictly speaking, it even reduces availability. Which is why nobody
actually wants *only* synchronous replication. Instead they use quorum
commit or semi-synchronous (shudder) replication, which only requires
*some* nodes to be in sync, but effectively replicates asynchronously to
the others.

>From that point of view, the requirement of having one synch and two
async standbies is pretty much the same as having three synch standbies
with a quorum commit of 1. (Except for additional availability of the
later variant, because in case of a failure of the one sync standby, any
of the others can take over without admin intervention).

Regards

Markus Wanner

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] host name support in pg_hba.conf

2010-10-06 Thread Peter Eisentraut

On tis, 2010-10-05 at 22:17 -0400, Tom Lane wrote:
> So far as I can find, there is *no* standard
> mandating that localhost means the loopback address.

Should we then change pgstat.c to use IP addresses instead of hardcoding
"localhost"?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Issues with Quorum Commit

2010-10-06 Thread Fujii Masao

On Wed, Oct 6, 2010 at 10:52 AM, Jeff Davis  wrote:
> I'm not sure I entirely understand. I was concerned about the case of a
> standby server being allowed to lag behind the rest by a large number of
> WAL records. That can't happen in the "wait for all servers to apply"
> case, because the system would become unavailable rather than allow a
> significant difference in the amount of WAL applied.
>
> I'm not saying that an unavailable system is good, but I don't see how
> my particular complaint applies to the "wait for all servers to apply"
> case.
>
> The case I was worried about is:
>  * 1 master and 2 standby
>  * The rule is "wait for at least one standby to apply the WAL"
>
> In your notation, I believe that's M -> { S1, S2 }
>
> In that case, if one S1 is just a little faster than S2, then S2 might
> build up a significant queue of unapplied WAL. Then, when S1 goes down,
> there's no way for the slower one to acknowledge a new transaction
> without playing through all of the unapplied WAL.
>
> Intuitively, the administrator would think that he was getting both HA
> and redundancy, but in reality the availability is no better than if
> there were only two servers (M -> S1), except that it might be faster to
> replay the WAL then to set up a new standby (but that's not guaranteed).

Agreed. This is similar to my previous complaint.
http://archives.postgresql.org/pgsql-hackers/2010-09/msg00946.php

This problem would happen even if we fix the quorum to 1 as Josh propose.
To avoid this, the master must wait for ACK from all the connected
synchronous standbys.

I think that this is likely to happen especially when we choose 'apply'
replication level. Because that level can easily lag a synchronous
standby because of the conflict between recovery and read-only query.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Issues with Quorum Commit

2010-10-06 Thread Markus Wanner

On 10/06/2010 04:31 AM, Simon Riggs wrote:
> That situation would require two things
> * First, you have set up async replication and you're not monitoring it
> properly. Shame on you.

The way I read it, Jeff is complaining about the timeout you propose
that effectively turns sync into async replication in case of a failure.

With a master that waits forever, the standby that's newly required for
quorum certainly still needs its time to catch up. But it wouldn't live
in danger of being "optimized away" for availability in case it cannot
catch up within the given timeout. It's a tradeoff between availability
and durability.

> So it can occur in both cases, though it now looks to me that its less
> important an issue in either case. So I think this doesn't rate the term
> dangerous to describe it any longer.

The proposed timeout certainly still sounds dangerous to me. I'd rather
recommend setting it to an incredibly huge value to minimize its dangers
and get sync replication when that is what has been asked for. Use async
replication for increased availability.

Or do you envision any use case that requires a quorum of X standbies
for normal operation but is just fine with only none to (X-1) standbies
in case of failures? IMO that's when sync replication is most needed and
when it absolutely should hold to its promises - even if it means to
stop the system.

There's no point in continuing operation if you cannot guarantee the
minimum requirements for durability. If you happen to want such a thing,
you should better rethink your minimum requirement (as performance for
normal operations might benefit from a lower minimum as well).

Regards

Markus Wanner

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Issues with Quorum Commit

2010-10-06 Thread Heikki Linnakangas


On 06.10.2010 01:14, Josh Berkus wrote:

You start a new one from the latest base backup and let it catch up?
Possibly modifying the config file in the master to let it know about
the new standby, if we go down that path. This part doesn't seem
particularly hard to me.


Agreed, not sure of the issue there.


See previous post.  The critical phrase is *without restarting the
master*.  AFAICT, no patch has addressed the need to change the master's
synch configuration without restarting it.  It's possible that I'm not
following something, in which case I'd love to have it pointed out.


Fair enough. I agree it's important that the configuration can be 
changed on the fly. It's orthogonal to the other things discussed, so 
let's just assume for now that we'll have that. If not in the first 
version, it can be added afterwards. "pg_ctl reload" is probably how it 
will be done.


There is some interesting behavioral questions there on what happens 
when the configuration is changed. Like if you first define that 3 out 
of 5 servers must acknowledge, and you have an in-progress commit that 
has received 2 acks already. If you then change the config to "2 out of 
4" servers must acknowledge, is the in-progress commit now satisfied? 
From the admin point of view, the server that was removed from the 
system might've been one that had acknowledged already, and logically in 
the new configuration the transaction has only received 1 acknowledgment 
from those servers that are still part of the system. Explicitly naming 
the standbys in the config file would solve that particular corner case, 
but it would no doubt introduce other similar ones.


But it's an orthogonal issue, we'll figure it out when we get there.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

66 matches

Mail list logo