date:20120604

Re: [HACKERS] Bug in new buffering GiST build code

2012-06-04 Thread Heikki Linnakangas


On 05.06.2012 09:45, Alexander Korotkov wrote:

On Tue, Jun 5, 2012 at 2:00 AM, Alexander Korotkovwrote:


On Mon, May 28, 2012 at 1:46 AM, Alexander Korotkovwrote:


On Sat, May 26, 2012 at 12:33 AM, Heikki Linnakangas<
heikki.linnakan...@enterprisedb.com>  wrote:


Alexander, do you still have the test environments and data lying around
that you used for GiST buffering testing last summer? Could you rerun some
of those tests with this patch?


I think I can restore test environment and data. Will rerun tests soon.


I rerun some of tests. There are index build times in seconds for old way
of parent refind and new way of it.

   old  new
usnoa2   2385 2452
usnoa2_shuffled  8131 8055
uniform  8327 8359

I thinks difference can be described by round error.


Oh, I mean not "round" error, but "random". I.e. not exactly same state of
shared buffers at index build start and so on.


Thanks. I was expecting a small performance gain from the new approach, 
but I guess not. Oh well, that would've just been a bonus, the important 
thing is that it now works.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Bug in new buffering GiST build code

2012-06-04 Thread Alexander Korotkov

On Tue, Jun 5, 2012 at 2:00 AM, Alexander Korotkov wrote:

> On Mon, May 28, 2012 at 1:46 AM, Alexander Korotkov 
> wrote:
>
>> On Sat, May 26, 2012 at 12:33 AM, Heikki Linnakangas <
>> heikki.linnakan...@enterprisedb.com> wrote:
>>
>>> Alexander, do you still have the test environments and data lying around
>>> that you used for GiST buffering testing last summer? Could you rerun some
>>> of those tests with this patch?
>>>
>>
>> I think I can restore test environment and data. Will rerun tests soon.
>>
>
> I rerun some of tests. There are index build times in seconds for old way
> of parent refind and new way of it.
>
>   old  new
> usnoa2   2385 2452
> usnoa2_shuffled  8131 8055
> uniform  8327 8359
>
> I thinks difference can be described by round error.
>

Oh, I mean not "round" error, but "random". I.e. not exactly same state of
shared buffers at index build start and so on.

--
With best regards,
Alexander Korotkov.

Re: [HACKERS] Unnecessary WAL archiving after failover

2012-06-04 Thread Noah Misch

On Fri, Mar 23, 2012 at 11:03:27PM +0900, Fujii Masao wrote:
> > On Wed, Feb 29, 2012 at 5:48 AM, Fujii Masao  wrote:
> >> In streaming replication, after failover, new master might have lots
> >> of un-applied
> >> WAL files with old timeline ID. They are the WAL files which were recycled 
> >> as a
> >> future ones when the server was running as a standby. Since they will 
> >> never be
> >> used later, they don't need to be archived after failover. But since they 
> >> have
> >> neither .ready nor .done file in archive_status, checkpoints after
> >> failover newly
> >> create .reacy files for them, and then finally they are archived.
> >> Which might cause
> >> disk I/O spike both in WAL and archive storage.

If the old master archived later WAL that the new master never restored, won't
this attempt to archive a file under a name that already exists in the
archive?  The documentation says this:

  The archive command should generally be designed to refuse to overwrite any
  pre-existing archive file. This is an important safety feature to preserve
  the integrity of your archive in case of administrator error (such as
  sending the output of two different servers to the same archive directory).

  It is advisable to test your proposed archive command to ensure that it
  indeed does not overwrite an existing file, and that it returns nonzero
  status in this case.

Archiving on the new master would halt until the operator intervenes.

> >> To avoid the above problem, I think that un-applied WAL files with old
> >> timeline ID
> >> should be marked as already-archived and recycled immediately at the end of
> >> recovery. Thought?

A small hazard comes to mind.  If the administrator manually copied
post-timeline-divergence segments from the failed master to the new master's
pg_xlog, the current implementation loads them into the archive for you.  The
new master could never apply those files locally, but they might be useful for
alternate recoveries down the previous timeline.  Nonetheless, we can just as
reasonably specify that it's not a role of the new master to provide this
service.  Call the fact that it did so in previous releases an implementation
artifact.

What about instead creating an archive status file at recycle time and
deleting it as we begin to populate the file?  That distinguishes copied-in,
unarchived segments from recycled ones.

Incidentally, RemoveOldXlogFiles() has this comment:

/*
 * We ignore the timeline part of the XLOG segment identifiers 
in
 * deciding whether a segment is still needed.  This ensures 
that we
 * won't prematurely remove a segment from a parent timeline. 
We could
 * probably be a little more proactive about removing segments 
of
 * non-parent timelines, but that would be a whole lot more
 * complicated.

Should both instances of "parent" be "child" or "descendant"?

> Just after failover, there can be three kinds of WAL files in new
> master's pg_xlog directory:
> 
> (1) WAL files which were recycled to by restartpoint
> 
> I've already explained upthread the issue which these WAL files cause
> after failover.
> 
> 
> (2) WAL files which were restored from the archive
> 
> In 9.1 or before, the restored WAL files don't remain after failover
> because they are always restored onto the temporary filename
> "RECOVERYXLOG". So the issue which I explain from now doesn't exist
> in 9.1 or before.
> 
> In 9.2dev, as the result of supporting cascade replication,
> an archived WAL file is restored onto correct file name so that
> cascading walsender can send it to another standby. This restored

The documentation still says this:

  WAL segments that cannot be found in the archive will be sought in pg_xlog/;
  this allows use of recent un-archived segments. However, segments that are
  available from the archive will be used in preference to files in
  pg_xlog/. The system will not overwrite the existing contents of pg_xlog/
  when retrieving archived files.

I gather the last sentence is now false?

> WAL file has neither .ready nor .done archive status file. After
> failover, checkpoint checks the archive status file of the restored
> WAL file to attempt to recycle it, finds that it has neither .ready
> nor ,done, and creates .ready. Because of existence of .ready,
> it will be archived again even though it obviously already exists in
> the archival storage :(
> 
> To prevent a restored WAL file from being archived again, I think
> that .done should be created whenever WAL file is successfully
> restored (of course this should happen only when archive_mode is
> enabled). Thought?

Your proposed fix makes sense, and I cannot think of any disadvantage.
Concerning only doing it when archive_mode=on, would there ever be a case
where a segment is restored under archive_mode=off, then the server restarted
with archive_mode=on and an archival attempted on that se

[HACKERS] 9.3: load path to mitigate load penalty for checksums

2012-06-04 Thread Jeff Davis

Introduction:
=
A lot of progress has been made on the checksums issue, with Simon's
excellent work beginning to gain consensus:

http://archives.postgresql.org/message-id/CA
+u5nmkw_gbs6qq_y8-rjgl1v7mvw2hwbhartb8lojhnpfx...@mail.gmail.com

For the purposes of this proposal, I'll assume that's the general
direction we'll be taking for CRCs.

The major drawback of that proposal is that it increases the amount of
work to be done after a large data load by requiring more WAL.

Proposal:
=
I propose a special user-initiated loading mode at the table
granularity. During this time, readers must ignore PD_ALL_VISIBLE,
HEAP_XMIN_COMMITTED, and the visibility map entirely. However, writers
may set all of those bits before the writing transaction commits,
obviating the need to rewrite (and WAL) the data again later. Ideally,
there would be no work for VACUUM to do after the data load (unless a
transaction aborted).

This would also help the non-CRC case of course, but I expect CRCs to
make this significantly more important.

Goals:
=

  * Table granularity (doesn't affect other tables at all)
  * Allows concurrent loaders
  * Allows loading into existing tables with existing data
  * Online (allow reads to proceed, even if degraded)

Rough user interaction:
=

  INITIATE LOAD ON foo AS 'job name';

  -- run concurrent loading sessions

  FINALIZE LOAD 'job name';

High-level design:
=

By "hints" I mean the VM bit, PD_ALL_VISIBLE, and HEAP_XMIN_COMMITTED.

By "ignorehints" I mean a flag in pg_class indicating that readers
should ignore hints.

By "optimistichints" I mean a flag in pg_class indicating that writers
can optimistically set hints.

Obviously, readers and writers would need a mechanism to honor those
flags, but I haven't dug into the details yet (additional routines in
tqual.c?).

  States:
0: normal
   * ignorehints = false
   * optimistichints = false
1: trying to enter data load mode, waiting on existing
   lockers (who might be scanning) to finish
   * ignorehints = true
   * optimistichints = false
2: data load mode
   * ignorehints = true
   * optimistichints = true
3: trying to leave data load mode, waiting on old snapshots to
   be released and aborted transactions to be cleaned up
   * ignorehints = true
   * optimistichints = false

INITIATE LOAD would first transition from state 0 to 1 by acquiring a
ShareUpdateExclusiveLock on the table (to be sure no concurrent INITIATE
or FINALIZE LOAD is going on) and setting ignorehints = true.

Then it moves from state 1 to state 2 by waiting for all transactions
that hold a lock on that table. Any transactions that don't already have
a lock will see the new flag when they try to get it. Now we're sure
that all readers will see the "ignorehints" flag, so we can set the
"optimistichints" flag to indicate that writers can write hints
optimistically.

FINALIZE LOAD would first move from state 2 to state 3 by acquiring a
ShareUpdateExclusiveLock on the table setting optimistichints = false.

Then, it would move from state 3 to state 0 by first waiting for all
transactions that currently hold a lock on the table, to ensure they see
the optimistichints=false flag. Then, it would remember the current xid
as max_loader_xid, and wait until the global xmin is greater than
max_loader_xid. This should ensure that all snapshots regard all loading
transactions as complete. Also, it would need to run a lazy VACUUM to
remove any tuples from aborted transactions.

Details and optimizations
=
 * We probably want a graceful way to handle multiple data loads
happening on the same table. Rather than erroring out, we could treat it
more like a reference count, and only do the work to move in to data
load mode if not there already, and only move out of data load mode if
we're the last loading job on the table.
 * In general, there are some usability issues to sort out, to make sure
a table isn't put into data load mode and left that way. Right now, I'm
mostly concerned with getting a working design, but those will be
important, too.
 * We could optimize away the VACUUM going from 3 -> 0 if we are sure no
writing transactions aborted.
 * INITIATE and FINALIZE probably need to use PreventTransactionChain()
and multiple transactions, to avoid holding the ShareUpdateExclusiveLock
for too long. Also, we want to keep people from using it in the same
transaction as the loading xact, because they might not realize that
they would get a concurrency of 1 that way (because of the
ShareUpdateExclusiveLock).

Thoughts?

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [RFC] Interface of Row Level Security

2012-06-04 Thread Tom Lane

Florian Pflug  writes:
> On Jun4, 2012, at 18:38 , Kohei KaiGai wrote:
>> 2012/6/4 Florian Pflug :
>>> Without something like RLSBYPASS, the DBA needs to have intimate
>>> knowledge about the different RLS policies to e.g. guarantee that his
>>> backups aren't missing crucial information, or that the replication
>>> system indeed replicates all rows.

>> It seems to me you can define a function which implements site-
>> specific security requirement (E.g "backup should not be prevented
>> by RLS policy"), then include it as a part of RLS policy
>> (or implicitly added by extensions, like sepgsql tries to do).

> Sure. But that requires each and every application which uses RLS
> to provide support for special backup (or replication, or whatever)
> privileges. And it requires the DBA to know how to assign these
> privileges to certain roles for each any every application in question.
> For shops which uses a lot of different applications, all with their
> own RLS policy, that can quickly get out of hand.

> Plus, a bug in one of these RLS policies has the potential to render
> backups incomplete.

I agree with Florian here: if there is no way to disable RLS for
backups, the database will be un-administratable.  RLSBYPASS is not
necessarily the only or best way to provide such an override, but we
have to have something that is simple, foolproof, and *not* dependent
on the details of any one RLS policy.

I suspect that KaiGai-san's objection basically comes down to not
wanting to have what amounts to a backdoor in RLS policies.  However,
what Florian is saying is that you have to have a backdoor anyway,
unless you'd like to be at risk of losing data because it wasn't
backed up.  You can either have one well-understood, well-documented,
well-tested backdoor, or you can have an ad-hoc backdoor in each RLS
policy.  Nobody can think that the latter approach is preferable.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [RFC] Interface of Row Level Security

2012-06-04 Thread Florian Pflug

On Jun4, 2012, at 18:38 , Kohei KaiGai wrote:
> 2012/6/4 Florian Pflug :
>> Without something like RLSBYPASS, the DBA needs to have intimate
>> knowledge about the different RLS policies to e.g. guarantee that his
>> backups aren't missing crucial information, or that the replication
>> system indeed replicates all rows.
>> 
>> With RLSBYPASS, all he needs to do is grant one privilege to his
>> replication or backup user. The rest can be left to the development
>> or support team for a specific application.

> It seems to me you can define a function which implements site-
> specific security requirement (E.g "backup should not be prevented
> by RLS policy"), then include it as a part of RLS policy
> (or implicitly added by extensions, like sepgsql tries to do).

Sure. But that requires each and every application which uses RLS
to provide support for special backup (or replication, or whatever)
privileges. And it requires the DBA to know how to assign these
privileges to certain roles for each any every application in question.
For shops which uses a lot of different applications, all with their
own RLS policy, that can quickly get out of hand.

Plus, a bug in one of these RLS policies has the potential to render
backups incomplete.

best regards,
Florian Pflug

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] No, pg_size_pretty(numeric) was not such a hot idea

2012-06-04 Thread Tom Lane

Jim Nasby  writes:
> On 5/27/12 2:54 PM, Euler Taveira wrote:
>> On 27-05-2012 10:45, Fujii Masao wrote:
>>> OK, let me propose another approach: add pg_size_pretty(int).

>> I wouldn't like to add another function but if it solves both problems... +1.

> FWIW, I would argue that the case of pg_size_pretty(8*1024*1024) is
> pretty contrived...

Yeah, possibly.  In any case, I don't think we're making either of these
changes in 9.2, because the time for forcing initdbs is past.  It would
only be realistic to think about changing pg_size_pretty() if we come
across some other, much more compelling reason to force a system catalog
contents change.

Assuming that's how 9.2 ships, we might as well wait to see if there
are any real complaints from the field before we decide whether any
changing is needed.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Bug in new buffering GiST build code

2012-06-04 Thread Alexander Korotkov

On Mon, May 28, 2012 at 1:46 AM, Alexander Korotkov wrote:

> On Sat, May 26, 2012 at 12:33 AM, Heikki Linnakangas <
> heikki.linnakan...@enterprisedb.com> wrote:
>
>> Alexander, do you still have the test environments and data lying around
>> that you used for GiST buffering testing last summer? Could you rerun some
>> of those tests with this patch?
>>
>
> I think I can restore test environment and data. Will rerun tests soon.
>

I rerun some of tests. There are index build times in seconds for old way
of parent refind and new way of it.

  old  new
usnoa2   2385 2452
usnoa2_shuffled  8131 8055
uniform  8327 8359

I thinks difference can be described by round error.
Indexes seem to be exactly same. It's predictable because changing
algorithm of parent refind shouldn't change the result.

--
With best regards,
Alexander Korotkov.

Re: [HACKERS] No, pg_size_pretty(numeric) was not such a hot idea

2012-06-04 Thread Jim Nasby


On 5/27/12 2:54 PM, Euler Taveira wrote:

On 27-05-2012 10:45, Fujii Masao wrote:

OK, let me propose another approach: add pg_size_pretty(int).
If we do this, all usability and performance problems will be solved.


I wouldn't like to add another function but if it solves both problems... +1.


FWIW, I would argue that the case of pg_size_pretty(8*1024*1024) is pretty 
contrived... when would you actually do something like that? ISTM that any time 
you're using pg_size_pretty you'd be coming off a real datatype.
--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] VIP: new format for psql - shell - simple using psql in shell

2012-06-04 Thread Jim Nasby


On 5/26/12 10:16 PM, Pavel Stehule wrote:

My idea is secure to separator - because separator is just space and
new line and these symbols are escaped.


ISTM it'd be a really good idea to support something other than space, since 
presumably that'd be trivial.

I'm not a fan of supporting the array construction. If you get to that level of 
complexity in bash it's a really good sign that you need to use a real language.

If we do add array support then it needs to be keyed to the actual shell in 
use, because it's inconsistent between them.
--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Send new protocol keepalive messages to standby servers.

2012-06-04 Thread Christopher Browne

On Sat, Jun 2, 2012 at 12:01 PM, Michael Nolan  wrote:
> On 6/2/12, Tom Lane  wrote:
>> Robert Haas  writes:
>>> On the other hand, if we simply say "PostgreSQL computes the
>>> replication delay by subtracting the time at which the WAL was
>>> generated, as recorded on the master, from the time at which it is
>>> replayed by the slave" then, hey, we still have a wart, but it's
>>> pretty clear what the wart is and how to fix it, and we can easily
>>> document that.  Again, if we could get rid of the failure modes and
>>> make this really water-tight, I think I'd be in favor of that, but it
>>> seems to me that we are in the process of expending a lot of energy
>>> and an even larger amount of calendar time to create a system that
>>> will misbehave in numerous subtle ways instead of one straightforward
>>> one.  I don't see that as a good trade.
>>
>> Well, okay, but let's document "if you use this feature, it's incumbent
>> on you to make sure the master and slave clocks are synced.  We
>> recommend running NTP." or words to that effect.
>
> What if the two servers are in different time zones?

NTP shouldn't have any problem; it uses UTC underneath.  As does
PostgreSQL, underneath.
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [RFC] Interface of Row Level Security

2012-06-04 Thread Kohei KaiGai

2012/6/4 Tom Lane :
> Kohei KaiGai  writes:
>> Here is two problems around RLSBYPASS. The first is we have
>> no idea to handle invalidation of prepared-statement when current
>> user is switched, right now.
>
> How is that specifically the fault of RLSBYPASS?  *Any* of the schemes
> you're proposing for inlined RLS checks will have problems with userID
> switching.
>
Really? I don't find out a scenario that cause a problem with user-id
switching in case when RLS policy is *unconditionally* appended then
evaluated on executor stage. I'd like to see the scenario.

> My guess is we'd have to treat the effective userID as part of the
> plancache lookup key to make it safe to inline anything related to RLS.
>
It might be a solution, if we append individual RLS policy at the
planner stage, depending on user-id.

Thanks,
-- 
KaiGai Kohei 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Btree or not btree? That is the question

2012-06-04 Thread Tom Lane

Greg Sabino Mullane  writes:
> We have a 8.3.18 system (yes, the same one from the previous 
> thread, finally upgraded!) that gave us this error yesterday:

> ERROR:  index "pg_class_oid_index" is not a btree

That means you got bogus data while reading the metapage.
I'm beginning to wonder about the hardware on this server ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [RFC] Interface of Row Level Security

2012-06-04 Thread Tom Lane

Kohei KaiGai  writes:
> Here is two problems around RLSBYPASS. The first is we have
> no idea to handle invalidation of prepared-statement when current
> user is switched, right now.

How is that specifically the fault of RLSBYPASS?  *Any* of the schemes
you're proposing for inlined RLS checks will have problems with userID
switching.

My guess is we'd have to treat the effective userID as part of the
plancache lookup key to make it safe to inline anything related to RLS.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile

2012-06-04 Thread Merlin Moncure

On Mon, Jun 4, 2012 at 10:42 AM, Ants Aasma  wrote:
> On Mon, Jun 4, 2012 at 6:38 PM, Merlin Moncure  wrote:
>> On Mon, Jun 4, 2012 at 10:17 AM, Merlin Moncure  wrote:
>>> What happens (in the very unlikely, but possible case?) if another
>>> backend races to the buffer you've pointed at with 'victim'?  It looks
>>> like multiple backends share the clock sweep now, but don't you need
>>> to need an extra test to ensure it's still a candidate victim buffer?
>>
>> Actually, I don't think you do: the existing check on refcount is
>> probably good enough.  Hm, why did you get rid of
>> BufferStrategyControl.lastFreeBuffer?
>
> It was dead code as far as I could tell. That change isn't actually
> relevant for this patch because free-list management is still
> protected by a lock (except the initial unlocked test that is
> doublechecked under lock) and so doesn't need any adjustment.

I have to admit -- this is pretty cool.  The changes look pretty clean
but I think it's going to need some benchmarking to prove it's
actually faster under all workloads and some more review though. If
there's heavy contention on getting victim buffers you're going to be
making a lot of atomic operations -- each one of which will lock down
the cache line.  A yielding lwlock is obviously going to have very
different behaviors under contention, so there needs to be a
performance test that really burns up the freelist.

If it's Well And Truly faster, you have to immediately start wondering
if the freelist lock can't be completely scrapped: buffer invalidation
only occurs in fairly rare events -- but when it does ISTM to be
fairly counter productive.  In trying to avoid the sweep you end up
pushing everyone towards a locky free list.  I don't know about you,
but I'd prefer to have more consistent behavior (as in, not "drop big
table, concurrency drops 50%").

Noted, this doesn't really address the OP's problem :-).  I bet it'd
supplement the nailing strategy nicely however.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [RFC] Interface of Row Level Security

2012-06-04 Thread Kohei KaiGai

2012/6/4 Florian Pflug :
> On Jun4, 2012, at 17:38 , Kohei KaiGai wrote:
>> I'm worry about future maintenance issues, once we have
>> RLSBYPASS permission or something user visible…
>
> I fear that without a generic way to disable RLS regardless which
> RLS policy function is in effect, we're creating a huge maintenance
> issue for DBAs. In a lot of shops, the DBA is responsible for a large
> number of databases, each potentially using a completely different
> approach to RLS and hence a completely different policy function.
>
Here is two problems around RLSBYPASS. The first is we have
no idea to handle invalidation of prepared-statement when current
user is switched, right now. The second is we can have another
way to describe same RLS policy without PG original enhancement
towards permission mechanism...

> Without something like RLSBYPASS, the DBA needs to have intimate
> knowledge about the different RLS policies to e.g. guarantee that his
> backups aren't missing crucial information, or that the replication
> system indeed replicates all rows.
>
> With RLSBYPASS, all he needs to do is grant one privilege to his
> replication or backup user. The rest can be left to the development
> or support team for a specific application.
>
It seems to me you can define a function which implements site-
specific security requirement (E.g "backup should not be prevented
by RLS policy"), then include it as a part of RLS policy
(or implicitly added by extensions, like sepgsql tries to do).

These are the reason why I hesitate to go ahead with RLSBYPASS
permission.

Thanks,
-- 
KaiGai Kohei 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Btree or not btree? That is the question

2012-06-04 Thread Greg Sabino Mullane

We have a 8.3.18 system (yes, the same one from the previous 
thread, finally upgraded!) that gave us this error yesterday:

ERROR:  index "pg_class_oid_index" is not a btree

It seems like this happened from just a normal SELECT query, and did 
not occur again (and this is a very busy system) The pg_class table 
does gets vacuumed and reindexed often. All appeared fine when I 
did some manual investigation.

Since this only happened this one time, can I chalk it up to 
some random rare race condition having to do with a reindex? 
Or is there anything else I should be looking for or can 
proactively do? No other system catalog problems have been seen 
before it or since.

-- 
Greg Sabino Mullane g...@endpoint.com
End Point Corporation
PGP Key: 0x14964AC8


pgp83Kexp6hqX.pgp
Description: PGP signature

Re: [HACKERS] Updated version of pg_receivexlog

2012-06-04 Thread Ants Aasma

On Mon, Jun 4, 2012 at 6:53 PM, Magnus Hagander  wrote:
> No, it's more a "there's no reason to do that". I don't think it
> should necessarily be an actual problem.

Ok, good to know.

> In your case the missing piece of information is why was there a
> timeline switch? pg_basebackup shouldn't cause a timeline switch
> whether you use it in -x mode or not.

No mystery there. The timeline switch was because I had just promoted
the master for standby mode. There's a chance I might have
accidentally done something horribly wrong somewhere because I can't
immediately reproduce this. I'll let you know if I find out how I
managed to create this error.

Ants Aasma
-- 
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [RFC] Interface of Row Level Security

2012-06-04 Thread Florian Pflug

On Jun4, 2012, at 17:38 , Kohei KaiGai wrote:
> I'm worry about future maintenance issues, once we have
> RLSBYPASS permission or something user visible…

I fear that without a generic way to disable RLS regardless which
RLS policy function is in effect, we're creating a huge maintenance
issue for DBAs. In a lot of shops, the DBA is responsible for a large
number of databases, each potentially using a completely different
approach to RLS and hence a completely different policy function.

Without something like RLSBYPASS, the DBA needs to have intimate
knowledge about the different RLS policies to e.g. guarantee that his
backups aren't missing crucial information, or that the replication
system indeed replicates all rows.

With RLSBYPASS, all he needs to do is grant one privilege to his
replication or backup user. The rest can be left to the development
or support team for a specific application.

best regards,
Florian Pflug

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Updated version of pg_receivexlog

2012-06-04 Thread Magnus Hagander

On Mon, Jun 4, 2012 at 5:48 PM, Ants Aasma  wrote:
> On Mon, Jun 4, 2012 at 6:20 PM, Fujii Masao  wrote:
>> On Mon, Jun 4, 2012 at 11:25 PM, Ants Aasma  wrote:
>>> On Thu, Sep 29, 2011 at 11:30 PM, Magnus Hagander  
>>> wrote:
> it doesn't say that is not possible to use this for a standby
> server... probably that's why i get the error i put a recovery.conf
> after pg_basebackup finished... maybe we can say that  more loudly?

 The idea is, if you use it with -x (or --xlog), it's for taking a
 backup/clone, *not* for replication.

 If you use it without -x, then you can use it as the start of a
 replica, by adding a recovery.conf.

 But you can't do both at once, that will confuse it.
>>>
>>> I stumbled upon this again today. There's nothing in the docs that
>>> would even hint that using -x shouldn't work to create a replica. Why
>>> does it get confused and can we (easily) make it not get confused? At
>>> the very least it needs a big fat warning in documentation for the -x
>>> option that the resulting backup might not be usable as a standby.
>>
>> Unless I'm missing something, you can use pg_basebackup -x for the
>> standby. If lots of WAL files are generated in the master after
>> pg_basebackup -x ends and before you start the standby instance,
>> you may get the following error. In this case, you need to consult with
>> archived WAL files even though you specified -x option in pg_basebackup.
>>
>>> FATAL:  could not receive data from WAL stream: FATAL:  requested WAL
>>> segment 0001005C has already been removed
>>
>> Though we have the above problem, pg_basebackup -x is usable for
>> the standby, I think.
>
> I assumed from Magnus's comment that this is a known problem. I wonder
> what went wrong if it should have worked. In the case where this
> turned up the missing file was an xlog file with the new timeline ID
> but one segment before the timeline switch. I'll have to see if I can
> create a reproducible case for this.

No, it's more a "there's no reason to do that". I don't think it
should necessarily be an actual problem.

In your case the missing piece of information is why was there a
timeline switch? pg_basebackup shouldn't cause a timeline switch
whether you use it in -x mode or not.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Updated version of pg_receivexlog

2012-06-04 Thread Ants Aasma

On Mon, Jun 4, 2012 at 6:20 PM, Fujii Masao  wrote:
> On Mon, Jun 4, 2012 at 11:25 PM, Ants Aasma  wrote:
>> On Thu, Sep 29, 2011 at 11:30 PM, Magnus Hagander  
>> wrote:
 it doesn't say that is not possible to use this for a standby
 server... probably that's why i get the error i put a recovery.conf
 after pg_basebackup finished... maybe we can say that  more loudly?
>>>
>>> The idea is, if you use it with -x (or --xlog), it's for taking a
>>> backup/clone, *not* for replication.
>>>
>>> If you use it without -x, then you can use it as the start of a
>>> replica, by adding a recovery.conf.
>>>
>>> But you can't do both at once, that will confuse it.
>>
>> I stumbled upon this again today. There's nothing in the docs that
>> would even hint that using -x shouldn't work to create a replica. Why
>> does it get confused and can we (easily) make it not get confused? At
>> the very least it needs a big fat warning in documentation for the -x
>> option that the resulting backup might not be usable as a standby.
>
> Unless I'm missing something, you can use pg_basebackup -x for the
> standby. If lots of WAL files are generated in the master after
> pg_basebackup -x ends and before you start the standby instance,
> you may get the following error. In this case, you need to consult with
> archived WAL files even though you specified -x option in pg_basebackup.
>
>> FATAL:  could not receive data from WAL stream: FATAL:  requested WAL
>> segment 0001005C has already been removed
>
> Though we have the above problem, pg_basebackup -x is usable for
> the standby, I think.

I assumed from Magnus's comment that this is a known problem. I wonder
what went wrong if it should have worked. In the case where this
turned up the missing file was an xlog file with the new timeline ID
but one segment before the timeline switch. I'll have to see if I can
create a reproducible case for this.

Ants Aasma
-- 
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile

2012-06-04 Thread Ants Aasma

On Mon, Jun 4, 2012 at 6:38 PM, Merlin Moncure  wrote:
> On Mon, Jun 4, 2012 at 10:17 AM, Merlin Moncure  wrote:
>> What happens (in the very unlikely, but possible case?) if another
>> backend races to the buffer you've pointed at with 'victim'?  It looks
>> like multiple backends share the clock sweep now, but don't you need
>> to need an extra test to ensure it's still a candidate victim buffer?
>
> Actually, I don't think you do: the existing check on refcount is
> probably good enough.  Hm, why did you get rid of
> BufferStrategyControl.lastFreeBuffer?

It was dead code as far as I could tell. That change isn't actually
relevant for this patch because free-list management is still
protected by a lock (except the initial unlocked test that is
doublechecked under lock) and so doesn't need any adjustment.

Ants Aasma
-- 
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [RFC] Interface of Row Level Security

2012-06-04 Thread Kohei KaiGai

2012/6/4 Robert Haas :
> On Sat, Jun 2, 2012 at 12:58 AM, Kohei KaiGai  wrote:
>> 2012/6/1 Robert Haas :
>>> On Thu, May 31, 2012 at 3:52 PM, Kohei KaiGai  wrote:
 It may be an option to separate the case into two; a situation to execute
 the given query immediately just after optimization and never reused,
 and others.
>>>
>>> Yes.  Simon suggested exactly this a while back, and I agree with him
>>> that we ought to have it.
>>>
>> It is good for me, also.
>>
 Then, if so, we will be able to push the stuff corresponding to
 RLSBYPASS into the query optimization, and works transparently
 for users.
>>>
>>> You're still going to need a way to make sure that the cluster can be
>>> dumped properly.  RLSBYPASS accomplishes that; your scheme doesn't.
>>>
>> If something like "has_superuser_privilege() OR" is automatically
>> appended to user given clauses, it makes sure whole of the database
>> cluster is dumped.
>> That also means any permission checks are delayed to executor stage
>> (except for the case on simple query protocol, I mentioned above),
>> thus it simplifies the condition to invalidate prepared statement.
>>
>> One problem I noticed last night around RLSBYPASS approach is:
>> it can take much number of invalidation whenever current user-id is
>> switched. Not only SET AUTHORIZATION, but SECURITY DEFINER
>> function's invocation also. I'm not sure whether this invalidation
>> storm is reasonable level, or not.
>>
>> Is it unavailable to handle this type of implicit superuser checks
>> with existing EXECUTE statement?
>> I tried to run EXPLAIN with similar case.
>>
>> postgres=# PREPARE p1(int, bool) AS SELECT * FROM tbl WHERE y > $1 OR $2;
>> PREPARE
>> postgres=# EXPLAIN EXECUTE p1(10, current_user in ('kaigai','rhaas'));
>>                       QUERY PLAN
>> 
>>  Seq Scan on tbl  (cost=0.00..21.60 rows=1160 width=40)
>> (1 row
>>
>> However,
>>
>> postgres=# PREPARE p2(int) AS SELECT * FROM tbl
>>                   WHERE y > $1 OR current_user in ('kaigai','rhaas');
>> PREPARE
>> postgres=# EXPLAIN EXECUTE p2(10);
>>                                 QUERY PLAN
>> -
>>  Seq Scan on tbl  (cost=0.00..30.30 rows=394 width=40)
>>   Filter: ((y > 10) OR ("current_user"() = ANY ('{kaigai,rhaas}'::name[])))
>> (2 rows)
>>
>> Please assume the second condition something like superuser
>> checks in addition to the main security policy.
>> It implies an idea to replace a certain portion of clause that
>> consists of only constant values and stable / immutable functions
>> by shadow parameter to be calculated at executor stage, makes
>> sense to wipe out RLS policy for superusers. In addition, it requires
>> no new invalidation mechanism to prepared statements.
>
> I'm not sure how best to handle the invalidation issue... but I still
> think that relying on the query planner to use theorem-proving to
> simplify RLS conditions is going to be problematic.  You can certainly
> try it ad see how it comes out...
>
I think, the enhancement of planner can be handled independently
from RLS policy stuff, and we should avoid to contain many complex
stuff into one functionality.
At least, it will work correctly with idea to append implicit condition
("OR has_superuser_privilege()") without any invalidation mechanism,
although its performance penalty is not negligible; without planner-
enhancement.
How about your opinion?

I'm worry about future maintenance issues, once we have
RLSBYPASS permission or something user visible...

Thanks,
-- 
KaiGai Kohei 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile

2012-06-04 Thread Merlin Moncure

On Mon, Jun 4, 2012 at 10:17 AM, Merlin Moncure  wrote:
> What happens (in the very unlikely, but possible case?) if another
> backend races to the buffer you've pointed at with 'victim'?  It looks
> like multiple backends share the clock sweep now, but don't you need
> to need an extra test to ensure it's still a candidate victim buffer?

Actually, I don't think you do: the existing check on refcount is
probably good enough.  Hm, why did you get rid of
BufferStrategyControl.lastFreeBuffer?

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Updated version of pg_receivexlog

2012-06-04 Thread Fujii Masao

On Mon, Jun 4, 2012 at 11:25 PM, Ants Aasma  wrote:
> On Thu, Sep 29, 2011 at 11:30 PM, Magnus Hagander  wrote:
>>> it doesn't say that is not possible to use this for a standby
>>> server... probably that's why i get the error i put a recovery.conf
>>> after pg_basebackup finished... maybe we can say that  more loudly?
>>
>> The idea is, if you use it with -x (or --xlog), it's for taking a
>> backup/clone, *not* for replication.
>>
>> If you use it without -x, then you can use it as the start of a
>> replica, by adding a recovery.conf.
>>
>> But you can't do both at once, that will confuse it.
>
> I stumbled upon this again today. There's nothing in the docs that
> would even hint that using -x shouldn't work to create a replica. Why
> does it get confused and can we (easily) make it not get confused? At
> the very least it needs a big fat warning in documentation for the -x
> option that the resulting backup might not be usable as a standby.

Unless I'm missing something, you can use pg_basebackup -x for the
standby. If lots of WAL files are generated in the master after
pg_basebackup -x ends and before you start the standby instance,
you may get the following error. In this case, you need to consult with
archived WAL files even though you specified -x option in pg_basebackup.

> FATAL:  could not receive data from WAL stream: FATAL:  requested WAL
> segment 0001005C has already been removed

Though we have the above problem, pg_basebackup -x is usable for
the standby, I think.

Regards,

-- 
Fujii Masao

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile

2012-06-04 Thread Merlin Moncure

On Mon, Jun 4, 2012 at 9:20 AM, Ants Aasma  wrote:
> On Mon, Jun 4, 2012 at 5:12 PM, Robert Haas  wrote:
>> Note sure about the rest of this patch, but this part is definitely bogus:
>>
>> +#if !defined(pg_atomic_fetch_and_set)
>> +#define pg_atomic_fetch_and_set(dst, src, value) \
>> +       do { S_LOCK(&dummy_spinlock); \
>> +       dst = src; \
>> +       src = value; \
>> +       S_UNLOCK(&dummy_spinlock); } while (0)
>> +#endif
>>
>> Locking a dummy backend-local spinlock doesn't provide atomicity
>> across multiple processes.
>
> Right, of course. I don't know why I assumed that dummy_spinlock would
> be global. In any case, this is very WIP and doesn't even aspire to be
> portable yet. The main point was to see if there's any significant
> performance to be gained by this method.

yeah -- those are fallback routines in case the compiler primitives
don't exist.  I think I understand what Ants is doing here: he's
reducing the coverage of the free list lock to only cover actually
popping a buffer off the free list; it no longer covers the clock
sweep.  That's a massive win if it works.   In order to get away with
that he had to decompose all manipulations from the clock sweep to the
Strategy to thread safe atomic operations.

What happens (in the very unlikely, but possible case?) if another
backend races to the buffer you've pointed at with 'victim'?  It looks
like multiple backends share the clock sweep now, but don't you need
to need an extra test to ensure it's still a candidate victim buffer?

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [RFC] Interface of Row Level Security

2012-06-04 Thread Robert Haas

On Sat, Jun 2, 2012 at 12:58 AM, Kohei KaiGai  wrote:
> 2012/6/1 Robert Haas :
>> On Thu, May 31, 2012 at 3:52 PM, Kohei KaiGai  wrote:
>>> It may be an option to separate the case into two; a situation to execute
>>> the given query immediately just after optimization and never reused,
>>> and others.
>>
>> Yes.  Simon suggested exactly this a while back, and I agree with him
>> that we ought to have it.
>>
> It is good for me, also.
>
>>> Then, if so, we will be able to push the stuff corresponding to
>>> RLSBYPASS into the query optimization, and works transparently
>>> for users.
>>
>> You're still going to need a way to make sure that the cluster can be
>> dumped properly.  RLSBYPASS accomplishes that; your scheme doesn't.
>>
> If something like "has_superuser_privilege() OR" is automatically
> appended to user given clauses, it makes sure whole of the database
> cluster is dumped.
> That also means any permission checks are delayed to executor stage
> (except for the case on simple query protocol, I mentioned above),
> thus it simplifies the condition to invalidate prepared statement.
>
> One problem I noticed last night around RLSBYPASS approach is:
> it can take much number of invalidation whenever current user-id is
> switched. Not only SET AUTHORIZATION, but SECURITY DEFINER
> function's invocation also. I'm not sure whether this invalidation
> storm is reasonable level, or not.
>
> Is it unavailable to handle this type of implicit superuser checks
> with existing EXECUTE statement?
> I tried to run EXPLAIN with similar case.
>
> postgres=# PREPARE p1(int, bool) AS SELECT * FROM tbl WHERE y > $1 OR $2;
> PREPARE
> postgres=# EXPLAIN EXECUTE p1(10, current_user in ('kaigai','rhaas'));
>                       QUERY PLAN
> 
>  Seq Scan on tbl  (cost=0.00..21.60 rows=1160 width=40)
> (1 row
>
> However,
>
> postgres=# PREPARE p2(int) AS SELECT * FROM tbl
>                   WHERE y > $1 OR current_user in ('kaigai','rhaas');
> PREPARE
> postgres=# EXPLAIN EXECUTE p2(10);
>                                 QUERY PLAN
> -
>  Seq Scan on tbl  (cost=0.00..30.30 rows=394 width=40)
>   Filter: ((y > 10) OR ("current_user"() = ANY ('{kaigai,rhaas}'::name[])))
> (2 rows)
>
> Please assume the second condition something like superuser
> checks in addition to the main security policy.
> It implies an idea to replace a certain portion of clause that
> consists of only constant values and stable / immutable functions
> by shadow parameter to be calculated at executor stage, makes
> sense to wipe out RLS policy for superusers. In addition, it requires
> no new invalidation mechanism to prepared statements.

I'm not sure how best to handle the invalidation issue... but I still
think that relying on the query planner to use theorem-proving to
simplify RLS conditions is going to be problematic.  You can certainly
try it ad see how it comes out...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Updated version of pg_receivexlog

2012-06-04 Thread Ants Aasma

On Thu, Sep 29, 2011 at 11:30 PM, Magnus Hagander  wrote:
>> it doesn't say that is not possible to use this for a standby
>> server... probably that's why i get the error i put a recovery.conf
>> after pg_basebackup finished... maybe we can say that  more loudly?
>
> The idea is, if you use it with -x (or --xlog), it's for taking a
> backup/clone, *not* for replication.
>
> If you use it without -x, then you can use it as the start of a
> replica, by adding a recovery.conf.
>
> But you can't do both at once, that will confuse it.

I stumbled upon this again today. There's nothing in the docs that
would even hint that using -x shouldn't work to create a replica. Why
does it get confused and can we (easily) make it not get confused? At
the very least it needs a big fat warning in documentation for the -x
option that the resulting backup might not be usable as a standby.

Ants Aasma
-- 
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile

2012-06-04 Thread Ants Aasma

On Mon, Jun 4, 2012 at 5:12 PM, Robert Haas  wrote:
> Note sure about the rest of this patch, but this part is definitely bogus:
>
> +#if !defined(pg_atomic_fetch_and_set)
> +#define pg_atomic_fetch_and_set(dst, src, value) \
> +       do { S_LOCK(&dummy_spinlock); \
> +       dst = src; \
> +       src = value; \
> +       S_UNLOCK(&dummy_spinlock); } while (0)
> +#endif
>
> Locking a dummy backend-local spinlock doesn't provide atomicity
> across multiple processes.

Right, of course. I don't know why I assumed that dummy_spinlock would
be global. In any case, this is very WIP and doesn't even aspire to be
portable yet. The main point was to see if there's any significant
performance to be gained by this method.

Ants Aasma
-- 
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [GENERAL] pg_upgrade from 9.0.7 to 9.1.3: duplicate key pg_authid_oid_index

2012-06-04 Thread Bruce Momjian

On Sat, Jun 02, 2012 at 05:10:03PM -0400, Tom Lane wrote:
> Bruce Momjian  writes:
> > On Fri, Jun 01, 2012 at 09:52:59AM -0400, Tom Lane wrote:
> >> It seems that pg_upgrade needs a check to make sure that the bootstrap
> >> superuser is named the same in old and new clusters.
> 
> > The attached patch adds checks to verify the the old/new servers have
> > the same install-user oid.
> 
> That may or may not be a useful check to make, but it's got
> approximately nothing to do with what I was complaining about.
> 
> In particular, supposing that the user has given you a username that
> isn't the bootstrap superuser in the new cluster, this patch is not
> going to stop the update script from failing.  Because the script is
> then going to try to replace the bootstrap superuser, and that is
> certainly going to give an error.
> 
> I see the point of worrying about the install user as well as the
> bootstrap superuser, but wouldn't it be best to insist they be the same?
> Particularly in the new cluster, where if they aren't the same it means
> the user has manually created at least one role in the new cluster,
> which is likely to lead to OID conflicts or worse.
> 
> Furthermore, if the bootstrap superusers aren't named the same, your
> patch fails to handle the original complaint.  In the case the
> OP mentioned, the old cluster had
>   OID 10: "ubuntu"
>   some user-defined OID: "postgres"
> and the new cluster had
>   OID 10: "postgres"
> If the user tells pg_upgrade to use username postgres, your check will
> not fail AFAICS, but nonetheless things are going to be messed up after
> the upgrade, because some objects and privileges that used to belong to
> the bootstrap superuser will now belong to a non-default superuser,
> whereas what used to belong to the non-default superuser will now belong
> to the bootstrap superuser.  That cannot be thought desirable.  For one
> reason, in the old installation the postgres role could have been
> dropped (possibly after dropping a few non-builtin objects) whereas the
> "ubuntu" role was pinned.  In the new installation, "postgres" is pinned
> and "ubuntu" won't be.
> 
> I think the checks that are actually needed here are (1) bootstrap
> superusers are named the same, and (2) there are no roles other than the
> bootstrap superuser in the new cluster.

You are right that it is more complex than I stated, but given the
limited feedback I got on the pg_upgrade/plplython, I figured people
didn't want to hear the details.  Here they are:

There are three failure modes for pg_upgrade:

1.  check failure
2.  schema restore failure
3.  silent failure/corruption

Of course, the later items are worse than the earlier ones.  The
reporter got a "schema restore failure" while still following the
pg_upgrade instructions.  My initial patch changed that #2 error to a #1
error.  Tom is right that creating users in the new cluster (against
instructions), can still generate a #2 error if a new/old pg_authid.oid
match, and they are not the install user, but seeing that is something
that is against the instructions, I was going to leave that as a #2.

However, since Tom feels we should check that and make it a #1 failure,
I have added that test to the attached patch.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +
diff --git a/contrib/pg_upgrade/check.c b/contrib/pg_upgrade/check.c
new file mode 100644
index d226f00..9f3dcda
*** a/contrib/pg_upgrade/check.c
--- b/contrib/pg_upgrade/check.c
*** check_new_cluster(void)
*** 138,144 
  	 *	We don't restore our own user, so both clusters must match have
  	 *	matching install-user oids.
  	 */
! 	if (old_cluster.install_user_oid != new_cluster.install_user_oid)
  		pg_log(PG_FATAL,
  		"Old and new cluster install users have different values for pg_authid.oid.\n");
  
--- 138,144 
  	 *	We don't restore our own user, so both clusters must match have
  	 *	matching install-user oids.
  	 */
! 	if (old_cluster.install_role_oid != new_cluster.install_role_oid)
  		pg_log(PG_FATAL,
  		"Old and new cluster install users have different values for pg_authid.oid.\n");
  
*** check_new_cluster(void)
*** 147,153 
  	 *	defined users might match users defined in the old cluster and
  	 *	generate an error during pg_dump restore.
  	 */
! 	if (new_cluster.user_count != 1)
  		pg_log(PG_FATAL, "Only the install user can be defined in the new cluster.\n");
  
  	check_for_prepared_transactions(&new_cluster);
--- 147,153 
  	 *	defined users might match users defined in the old cluster and
  	 *	generate an error during pg_dump restore.
  	 */
! 	if (new_cluster.role_count != 1)
  		pg_log(PG_FATAL, "Only the install user can be defined in the new cluster.\n");
  
  	check_for_prepared_transactions(&new_cluster);
*** check_is_super_user(ClusterInfo *cluster
*** 618,624

Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile

2012-06-04 Thread Robert Haas

On Fri, Jun 1, 2012 at 9:55 PM, Ants Aasma  wrote:
> On Sat, Jun 2, 2012 at 1:48 AM, Merlin Moncure  wrote:
>> Buffer pins aren't a cache: with a cache you are trying to mask a slow
>> operation (like a disk i/o) with a faster such that the amount of slow
>> operations are minimized.  Buffer pins however are very different in
>> that we only care about contention on the reference count (the buffer
>> itself is not locked!) which makes me suspicious that caching type
>> algorithms are the wrong place to be looking.  I think it comes to do
>> picking between your relatively complex but general, lock displacement
>> approach or a specific strategy based on known bottlenecks.
>
> I agree that pins aren't like a cache. I mentioned the caching
> algorithms because they work based on access frequency and highly
> contended locks are likely to be accessed frequently even from a
> single backend. However this only makes sense for the delayed
> unpinning method, and I also have come to the conclusion that it's not
> likely to work well. Besides delaying cleanup, the overhead for the
> common case of uncontended access is just too much.
>
> It seems to me that even the nailing approach will need a replacement
> algorithm. The local pins still need to be published globally and
> because shared memory size is fixed, the maximum amount of locally
> pinned nailed buffers needs to be limited as well.
>
> But anyway, I managed to completely misread the profile that Sergey
> gave. Somehow I missed that the time went into the retry TAS in slock
> instead of the inlined TAS. This shows that the issue isn't just
> cacheline ping-pong but cacheline stealing. This could be somewhat
> mitigated by making pinning lock-free. The Nb-GCLOCK paper that Robert
> posted earlier in another thread describes an approach for this. I
> have a WIP patch (attached) that makes the clock sweep lock-free in
> the common case. This patch gave a 40% performance increase for an
> extremely allocation heavy load running with 64 clients on a 4 core 1
> socket system, lesser gains across the board. Pinning has a shorter
> lock duration (and a different lock type) so the gain might be less,
> or it might be a larger problem and post a higher gain. Either way, I
> think the nailing approach should be explored further, cacheline
> ping-pong could still be a problem with higher number of processors
> and losing the spinlock also loses the ability to detect contention.

Note sure about the rest of this patch, but this part is definitely bogus:

+#if !defined(pg_atomic_fetch_and_set)
+#define pg_atomic_fetch_and_set(dst, src, value) \
+   do { S_LOCK(&dummy_spinlock); \
+   dst = src; \
+   src = value; \
+   S_UNLOCK(&dummy_spinlock); } while (0)
+#endif

Locking a dummy backend-local spinlock doesn't provide atomicity
across multiple processes.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] How to avoid base backup in automated failover

2012-06-04 Thread chinnaobi

Hi all,

Recently I was writing an application to implement automated failover with
env: Two 2008 R2 servers, Network area storage, asynchronous replication,
WAL archive on primary enabled.

Is there any way to avoid starting standby server always from base backup in
automated failover. I see the database is growing huge. I can't keep doing
base backup every day.

Please suggest solution

Regards,
Reddy

--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/How-to-avoid-base-backup-in-automated-failover-tp5711147.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_basebackup --xlog compatibility break

2012-06-04 Thread Peter Eisentraut

On tis, 2012-05-29 at 22:31 +0200, Magnus Hagander wrote:
> Yeah, good arguments all around, i agree too :-) Next question is -
> suggestions for naming of said paramter?

--xlog-method=something?  And/or -Xsomething, which would automatically
enable -x?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Bug in new buffering GiST build code

Re: [HACKERS] Bug in new buffering GiST build code

Re: [HACKERS] Unnecessary WAL archiving after failover

[HACKERS] 9.3: load path to mitigate load penalty for checksums

Re: [HACKERS] [RFC] Interface of Row Level Security

Re: [HACKERS] [RFC] Interface of Row Level Security

Re: [HACKERS] No, pg_size_pretty(numeric) was not such a hot idea

Re: [HACKERS] Bug in new buffering GiST build code

Re: [HACKERS] No, pg_size_pretty(numeric) was not such a hot idea

Re: [HACKERS] VIP: new format for psql - shell - simple using psql in shell

Re: [HACKERS] Re: [COMMITTERS] pgsql: Send new protocol keepalive messages to standby servers.

Re: [HACKERS] [RFC] Interface of Row Level Security

Re: [HACKERS] Btree or not btree? That is the question

Re: [HACKERS] [RFC] Interface of Row Level Security

Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile

Re: [HACKERS] [RFC] Interface of Row Level Security

[HACKERS] Btree or not btree? That is the question

Re: [HACKERS] Updated version of pg_receivexlog

Re: [HACKERS] [RFC] Interface of Row Level Security

Re: [HACKERS] Updated version of pg_receivexlog

Re: [HACKERS] Updated version of pg_receivexlog

Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile

Re: [HACKERS] [RFC] Interface of Row Level Security

Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile

Re: [HACKERS] Updated version of pg_receivexlog

Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile

Re: [HACKERS] [RFC] Interface of Row Level Security

Re: [HACKERS] Updated version of pg_receivexlog

Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile

Re: [HACKERS] Re: [GENERAL] pg_upgrade from 9.0.7 to 9.1.3: duplicate key pg_authid_oid_index

Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile

[HACKERS] How to avoid base backup in automated failover

Re: [HACKERS] pg_basebackup --xlog compatibility break

33 matches

Site Navigation

Mail list logo

Footer information