from:"Gurjeet Singh"

Re: [HACKERS] Function and view to retrieve WAL receiver status

2015-12-14 Thread Gurjeet Singh

On Sun, Dec 13, 2015 at 10:15 PM, Michael Paquier  wrote:

> On Mon, Dec 14, 2015 at 3:09 PM, Gurjeet Singh 
> wrote:
> > On Dec 13, 2015 9:56 PM, "Michael Paquier" 
> > wrote:
> >> If the node has no WAL receiver active, a tuple with NULL values is
> >> returned instead.
> >
> > IMO, in the absence of a WAL receiver the SRF (and the view) should not
> > return any rows.
>
> The whole point is to not use a SRF in this case: there is always at
> most one WAL receiver.
>

The function, maybe. But emitting an all-nulls row from a view seems
counter-intuitive, at least when looking at it in context of relational
database.

-- 
Gurjeet Singh http://gurjeet.singh.im/

Re: [HACKERS] Function and view to retrieve WAL receiver status

2015-12-13 Thread Gurjeet Singh

On Dec 13, 2015 9:56 PM, "Michael Paquier" 
wrote:

>
> If the node has no WAL receiver active, a tuple with NULL values is
> returned instead.

IMO, in the absence of a WAL receiver the SRF (and the view) should not
return any rows.

Re: [HACKERS] Limit GIST_MAX_SPLIT_PAGES to XLR_MAX_BLOCK_ID

2015-10-27 Thread Gurjeet Singh

(adding Heikki, since that macro was touched by his commit 04e298b8)

Does my previous description of the problem make sense, or am I fretting
over something that's a non-issue.

Best regards,


On Sun, Sep 20, 2015 at 7:03 PM, Gurjeet Singh  wrote:

> Gin code respects the XLR_MAX_BLOCK_ID when
> calling XLogEnsureRecordSpace(). But it appears that the Gist code does not
> try to limit its block-id consumption to XLR_MAX_BLOCK_ID.
>
> The GIST_MAX_SPLIT_PAGES is hard-coded at 75, but XLogEnsureRecordSpace()
> would reject a request of more than 32 (XLR_MAX_BLOCK_ID).
>
> The attached patch redefines GIST_MAX_SPLIT_PAGES so that in case of a
> split, gistplacetopage() now throws an error when the block-ids needed
> exceed 32.
>
> I have used Min(75, XLR_MAX_BLOCK_ID) as the macro expansion, but I
> believe it can be set to plain XLR_MAX_BLOCK_ID.
>
>
> --
> Gurjeet Singh http://gurjeet.singh.im/
>



-- 
Gurjeet Singh http://gurjeet.singh.im/

[HACKERS] Limit GIST_MAX_SPLIT_PAGES to XLR_MAX_BLOCK_ID

2015-09-20 Thread Gurjeet Singh

Gin code respects the XLR_MAX_BLOCK_ID when
calling XLogEnsureRecordSpace(). But it appears that the Gist code does not
try to limit its block-id consumption to XLR_MAX_BLOCK_ID.

The GIST_MAX_SPLIT_PAGES is hard-coded at 75, but XLogEnsureRecordSpace()
would reject a request of more than 32 (XLR_MAX_BLOCK_ID).

The attached patch redefines GIST_MAX_SPLIT_PAGES so that in case of a
split, gistplacetopage() now throws an error when the block-ids needed
exceed 32.

I have used Min(75, XLR_MAX_BLOCK_ID) as the macro expansion, but I believe
it can be set to plain XLR_MAX_BLOCK_ID.


-- 
Gurjeet Singh http://gurjeet.singh.im/


gist_limit_blockid_to_XLR_MAX_BLOCK_ID.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] replication slot restart_lsn initialization

2015-08-11 Thread Gurjeet Singh

On Mon, Aug 10, 2015 at 7:12 AM, Andres Freund  wrote:

> On 2015-07-07 09:42:54 -0700, Gurjeet Singh wrote:
> >  /*
> > + * Grab and save an LSN value to prevent WAL recycling past that point.
> > + */
> > +void
> > +ReplicationSlotRegisterRestartLSN()
> > +{
>
> Didn't like that description and function name very much. What does
> 'grabbing' mean here? Should probably mention that it works on the
> currently active slot and modifies it.
>

In your version, I don't see a comment that refers to the fact that it
works on the currently active (global variable) slot.


>
> It's now ReplicationSlotReserveWal()
>

Okay.


>
> > + ReplicationSlot *slot = MyReplicationSlot;
> > +
> > + Assert(slot != NULL);
> > + Assert(slot->data.restart_lsn == InvalidXLogRecPtr);
> > +
> > + /*
> > +  * The replication slot mechanism is used to prevent removal of
> required
> > +  * WAL. As there is no interlock between this and checkpoints
> required, WAL
> > +  * segment could be removed before
> ReplicationSlotsComputeRequiredLSN() has
> > +  * been called to prevent that. In the very unlikely case that
> this happens
> > +  * we'll just retry.
> > +  */
>
> You removed some punctuation in that sentence converting a sentence in
> bad english into one without the original meaning ;). See the attached
> for a new version.
>

Your version looks better.


>
> > +/*
> >   * Flush all replication slots to disk.
> >   *
> >   * This needn't actually be part of a checkpoint, but it's a convenient
> > @@ -876,7 +942,7 @@ StartupReplicationSlots(void)
> >  }
> >
> >  /* 
> > - * Manipulation of ondisk state of replication slots
> > + * Manipulation of on-disk state of replication slots
> >   *
> >   * NB: none of the routines below should take any notice whether a slot
> is the
> >   * current one or not, that's all handled a layer above.
> > diff --git a/src/backend/replication/slotfuncs.c
> b/src/backend/replication/slotfuncs.c
> > index 9a2793f..01b376a 100644
> > --- a/src/backend/replication/slotfuncs.c
> > +++ b/src/backend/replication/slotfuncs.c
> > @@ -40,6 +40,7 @@ Datum
> >  pg_create_physical_replication_slot(PG_FUNCTION_ARGS)
> >  {
> >   Namename = PG_GETARG_NAME(0);
> > + boolimmediately_reserve = PG_GETARG_BOOL(1);
> >   Datum   values[2];
> >   boolnulls[2];
> >   TupleDesc   tupdesc;
> > @@ -58,10 +59,28 @@ pg_create_physical_replication_slot(PG_FUNCTION_ARGS)
> >   /* acquire replication slot, this will check for conflicting names
> */
> >   ReplicationSlotCreate(NameStr(*name), false, RS_PERSISTENT);
> >
> > - values[0] = NameGetDatum(&MyReplicationSlot->data.name);
> > + if (immediately_reserve)
> > + {
> > + /* Allocate restart-LSN, if the user asked for it */
> > + ReplicationSlotRegisterRestartLSN();
> > +
> > + /* Write this slot to disk */
> > + ReplicationSlotMarkDirty();
> > + ReplicationSlotSave();
> >
> > - nulls[0] = false;
> > - nulls[1] = true;
> > + values[0] = NameGetDatum(&MyReplicationSlot->data.name);
> > + values[1] =
> LSNGetDatum(MyReplicationSlot->data.restart_lsn);
> > +
> > + nulls[0] = false;
> > + nulls[1] = false;
> > + }
> > + else
> > + {
> > + values[0] = NameGetDatum(&MyReplicationSlot->data.name);
> > +
> > + nulls[0] = false;
> > + nulls[1] = true;
> > + }
>
> I moved
> values[0] = NameGetDatum(&MyReplicationSlot->data.name);
> nulls[0] = false;
> to outside the conditional block, there seems no reason to have it in
> there?
>

The assignment to values[0] is being done twice. We can do away with the
one in the else part of the if condition.

Also, there was a typo in my patch [1] and it seems to have made it into
the commit: . Tom seems to have just fixed it in
commit 750fc78b.

Best regards,

[1]: I altered you to it in a personal email, but probably it fell through
the cracks.
-- 
Gurjeet Singh http://gurjeet.singh.im/

Re: [HACKERS] 64-bit XIDs again

2015-07-30 Thread Gurjeet Singh

On Jul 30, 2015 2:23 PM, "Tom Lane"  wrote:
>
> Gavin Flower  writes:
> > On 31/07/15 02:24, Heikki Linnakangas wrote:
> >> There is a big downside to expanding xmin/xmax to 64 bits: it takes
> >> space. More space means more memory needed for caching, more memory
> >> bandwidth, more I/O, etc.
>
> > I think having a special case to save 32 bits per tuple would cause
> > unnecessary complications, and the savings are minimal compared to the
> > size of current modern storage devices and the typical memory used in
> > serious database servers.
>
> I think the argument that the savings are minimal is pretty thin.
> It all depends on how wide your tables are --- but on a narrow table, say
> half a dozen ints, the current tuple size is 24 bytes header plus the same
> number of bytes of data.  We'd be going up to 32 bytes header which makes
> for a 16% increase in physical table size.  If your table is large,
> claiming that 16% doesn't hurt is just silly.
>
> But the elephant in the room is on-disk compatibility.  There is
> absolutely no way that we can just change xmin/xmax to 64 bits without a
> disk format break.  However, if we do something like what Heikki is
> suggesting, it's at least conceivable that we could convert incrementally
> (ie, if you find a page with the old header format, assume all tuples in
> it are part of epoch 0; and do not insert new tuples into it unless there
> is room to convert the header to new format ... but I'm not sure what we
> do about tuple deletion if the old page is totally full and we need to
> write an xmax that's past 4G).

Can we safely relegate the responsibility of tracking the per block epoch
to a relation fork?

Re: [HACKERS] "A huge debt of gratitude" - Michael Stonebraker

2015-07-23 Thread Gurjeet Singh

On Jul 22, 2015 12:07 PM, "Jolly Chen"  wrote:
>
> Hey everyone,
>
> You have probably heard that Mike Stonebraker recently won the Turing
award.  A recording of his award lecture is available at:
> https://www.youtube.com/watch?v=BbGeKi6T6QI
>
> It is an entertaining talk overall. If you fast forward to about the 1:07
mark, he makes some comments about postgres.
>
> Here’s my rough transcription:
>
> "The abstract data type system in postgres has been added to a lot of
relational database systems. It's kind of de facto table stakes for
relational databases these days, essentially intact.  That idea was really
a good one. It was mentioned in the citation for my Turing award winning.
However, serendipity played a huge role, which is, the biggest impact of
postgres by far came from two Berkeley students that I'll affectionately
call Grumpy and Sleepy.  They converted the academic postgres prototype
from QUEL to SQL in 1995. This was in parallel to the commercial activity.
And then a pick-up team of volunteers, none of whom have anything to do
with me or Berkeley, have been shepherding that open source system ever
since 1995. The system that you get off the web for postgres comes from
this pick-up team.  It is open source at its best and I want to just
mention that I have nothing to do with that and that collection of folks we
all owe a huge debt of gratitude to, because they have robustize that code
line and made it so it really works.”
>
> Thank you all so much for your hard work over the last twenty years!!
>
> Affectionately,
>
> Grumpy

Thank you! And a big thanks to the stewards of the project.

Sincerely,

Re: [HACKERS] ALTER TABLE .. ADD PRIMARY KEY .. USING INDEX has dump-restore hazard

2015-07-22 Thread Gurjeet Singh

On Tue, Jul 21, 2015 at 9:23 AM, Robert Haas  wrote:

> rhaas=# create unique index on foo (a collate "C");
> CREATE INDEX
> rhaas=# alter table foo add primary key using index foo_a_idx;
> ALTER TABLE
>

> Now dump and restore this database.  Then:
>

> Notice that the collation specifier is gone.  Oops.
>

The intent of the 'USING INDEX' feature was to work around the problem that
PRIMARY KEY indexes cannot be reindexed concurrently to reduce bloat.

Considering that the feature doesn't work [1] (at least as shown in example
in the docs [2]) if there are any foreign keys referencing the table,
there's scope for improvement.

There was proposal to support reindexing the primary key index, but I am
not sure where that stands. If that's already in, or soon to be in core, we
may put a deprecation notice around USING INDEX. OTOH, if reindexing PKeys
is too difficult, we may want to support index-replacement via a top-level
ALTER TABLE subcommand that works in CONCURRENT mode, and not recommend the
method shown in the docs (i.e deprecate the USING INDEX syntax).

[1]: The DROP CONSTRAINT part of the command fails if there are any FKeys
pointing to this table.
[2]: http://www.postgresql.org/docs/9.4/static/sql-altertable.html

-- 
Gurjeet Singh http://gurjeet.singh.im/

Re: [HACKERS] ALTER TABLE .. ADD PRIMARY KEY .. USING INDEX has dump-restore hazard

2015-07-22 Thread Gurjeet Singh

On Wed, Jul 22, 2015 at 7:34 AM, Robert Haas  wrote:

> On Tue, Jul 21, 2015 at 8:34 PM, Michael Paquier
>  wrote:
> > On Wed, Jul 22, 2015 at 1:23 AM, Robert Haas 
> wrote:
> >> Notice that the collation specifier is gone.  Oops.
> >
> > As it is not possible to specify directly a constraint for a PRIMARY
> > KEY expression, what about switching dumpConstraint to have it use
> > first a CREATE INDEX query with the collation and then use ALTER TABLE
> > to attach the constraint to it? I am noticing that we already fetch
> > the index definition in indxinfo via pg_get_indexdef. Thoughts?
>
> I guess the questions on my mind is "Why is it that you can't do this
> directly when creating a primary key, but you can do it when turning
> an existing index into a primary key?"
>
> If there's a good reason for prohibiting doing this when creating a
> primary key, then presumably it shouldn't be allowed when turning an
> index into a primary key either.  If there's not, then why not extend
> the PRIMARY KEY syntax to allow it?
>

+1. I think in the short term we can treat this as a bug, and "fix" the bug
by disallowing an index with attributes that cannot be present in an index
created by PRIMARY KEY constraint. The collation attribute on one of the
keys may be just one of many such attributes.

In the long term, we may want to allow collation in primary key, but that
will be a feature ideally suited for a major version release.

Best regards,
-- 
Gurjeet Singh http://gurjeet.singh.im/

Re: [HACKERS] [PATCH] Function to get size of asynchronous notification queue

2015-07-15 Thread Gurjeet Singh

On Thu, Jun 25, 2015 at 8:43 PM, Brendan Jurd  wrote:

> On Fri, 26 Jun 2015 at 06:03 Gurjeet Singh  wrote:
>
>
>> s/proportion/fraction/
>>
>
> I think of these as synonymous -- do you have any particular reason to
> prefer "fraction"?  I don't feel strongly about it either way, so I'm quite
> happy to go with fraction if folks find that more expressive.
>

It just feels better to me in this context.

If the number of times used in Postgres code is any measure, 'fraction'
wins hands down: "proportion" : 33, "fraction": 620.

I don't feel strongly about it, either. I can leave it up to the committer
to decide.


>
>
>>
>> + * The caller must hold (at least) shared AysncQueueLock.
>>
>> A possibly better wording: The caller must hold AysncQueueLock in (at
>> least) shared mode.
>>
>
> Yes, that is more accurate.
>

OK.


>
>
>>
>> Unnecessary whitespace changes in pg_proc.h for existing functions.
>>
>>
> I did group the asynchronous notification functions together, which seemed
> reasonable as there are now three of them, and changed the tabbing between
> the function name and namespace ID to match, as is done elsewhere in
> pg_proc.h.  I think those changes improve readability, but again I don't
> feel strongly about it.
>

Fair enough.


>
> +DESCR("get the current usage of the asynchronous notification queue");
>>
>> A possibly better wording: get the fraction of the asynchronous
>> notification queue currently in use
>>
>
> I have no objections to your wording.
>
>
OK. Please send a new patch with the changes you agree to, and I can mark
it ready for committer.

Best regards,
-- 
Gurjeet Singh http://gurjeet.singh.im/

Re: [HACKERS] More logging for autovacuum

2015-07-07 Thread Gurjeet Singh

On Tue, Jul 7, 2015 at 11:20 AM, Sawada Masahiko 
wrote:

> On Sat, Jul 4, 2015 at 2:30 PM, Amit Kapila 
> wrote:
> > On Thu, Jul 2, 2015 at 4:41 AM, Gurjeet Singh  wrote:
> >> log_min_messages acts as a single gate for everything headed for the
> >> server logs; controls for per-background process logging do not exist.
> If
> >> one wants to see DEBUG/INFO messages for just the Autovacuum (or
> >> checkpointer, bgwriter, etc.), they have to set log_min_messages to that
> >> level, but the result would be a lot of clutter from other processes to
> >> grovel through, to see the messages of interest.
> >>
> >
> > I think that will be quite helpful.  During the patch development of
> > parallel sequential scan, it was quite helpful to see the LOG messages
> > of bgworkers, however one of the recent commits (91118f1a) have
> > changed those to DEBUG1, now if I have to enable all DEBUG1
> > messages, then what I need will be difficult to find in all the log
> > messages.
> > Having control of separate logging for background tasks will serve such
> > a purpose.
> >
> >> The facilities don't yet exist, but it'd be nice if such parameters when
> >> unset (ie NULL) pick up the value of log_min_messages. So by default,
> the
> >> users will get the same behaviour as today, but can choose to tweak per
> >> background-process logging when needed.
> >>
> >
> > Will this proposal allow user to see all the messages by all the
> background
> > workers or will it be per background task.  Example in some cases user
> > might want to see the messages by all bgworkers and in some cases one
> > might want to see just messages by AutoVacuum and it's workers.
>

I want the logging to be controlled per background process, so that user
can pick and choose how much detail they want from each process. In absence
of such a setting for a background process, the global log_min_messages
should control the logging.

>
> > I think here designing User Interface is an important work if others also
> > agree
> > that such an option is useful, could you please elaborate your proposal?
>
> +1
>

I would like this feature to be as simple as the current log_min_messages;
currently a superuser can do ALTER USER X SET log_min_messages = Y, and
control logging for specific users or databases or a combination of those.

In the same vein, setting autovacuum.log_min_messages in postgresql.conf,
or a corresponding ALTER SYSTEM should be enough to use a different setting
of log_min_messages inside autovacuum launcher and its worker processes.

I am using autovacuum process as an example, but the same is applicable to
other background processes as well. E.g. checkpointer.log_min_messages.

As for the implementation details, upon startup and after every reload, the
autovacuum launcher process will look for autovacuum.log_min_messages being
defined; if yes, it'd set the global variable log_min_messages in code to
that value, and if not defined in any of the conf files, the global
variable in code would be assigned whatever is assigned to the GUC param
log_min_messages. DefineCustomXXVariable() seems incapable of supporting
this behaviour, so we might have to invent a new one.

This will allow us to keep the current behaviour as default when
autovacuum.log_min_messages is not defined, and provide finer control over
autovacuum's logging by defining this variable, when needed.

Same goes for Checkpointer, BGWriter, WALWriter, and BGWorkers. BTW, the
facility we invent needn't be limited to log_min_messages, but it's just
what we need now.

Best regards,
-- 
Gurjeet Singh http://gurjeet.singh.im/

Re: [HACKERS] replication slot restart_lsn initialization

2015-07-07 Thread Gurjeet Singh

On Tue, Jul 7, 2015 at 6:49 AM, Andres Freund  wrote:

> On 2015-07-07 06:41:55 -0700, Gurjeet Singh wrote:
> > There seems to be a misplaced not operator  ! in that if statement, as
> > well. That sucks :( The MacOS gcc binary is actually clang, and its
> output
> > is too noisy [1], which is probably why I might have missed warning if
> any.
>
> Which version of clang is it? With newer ones you can individually
> disable nearly all of the warnings. I regularly use clang, and most of
> the time it compiles master without warnings.
>


> $ gcc --version
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr
--with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 6.1.0 (clang-602.0.53) (based on LLVM 3.6.0svn)
Target: x86_64-apple-darwin14.4.0
Thread model: posix

I see that -Wno-parentheses-equality might help, but I am not looking
forward to maintaining OS specific flags for clang-that-pretends-to-be-gcc
in my shell scripts [2]

[2] https://github.com/gurjeet/pgd

Attached is a patch that introduces SlotIsPhyscial/SlotIsLogical macros.
This patch, if accepted, goes on top of the v4 patch.


>  > > >  pg_create_physical_replication_slot(PG_FUNCTION_ARGS)
> > > >  {
> > > >   Namename = PG_GETARG_NAME(0);
> > > > + boolactivate = PG_GETARG_BOOL(1);
> > >
> > > Don't like 'activate' much as a name. How about 'immediately_reserve'?
> > >
> >
> > I still like 'activate, but okay. How about 'reserve_immediately'
> instead?
>
> Activate is just utterly wrong. A slot isn't inactive before. And
> 'active' already is used for slots that are currently in use
> (i.e. connected to).
>
> > Also, do you want this name change just in the C code, or in the pg_proc
> > and docs as well?
>
> All.


Patch v4 attached.

On a side note, I see that the pg_create_*_replication_slot() functions do
not behave transactionally; that is, rolling back a transaction does not
undo the slot creation. Should we prevent these, and corresponding drop
functions, from being called inside an explicit transaction?
PreventTransactionChain() is geared towards serving just the utility
commands. Do we protect against calling such functions in a transaction
block, or from user functions? How?

Best regards,
-- 
Gurjeet Singh http://gurjeet.singh.im/


physical_repl_slot_activate_restart_lsn.v4.patch
Description: Binary data


slot_is_physical_or_logical_macros.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] replication slot restart_lsn initialization

2015-07-07 Thread Gurjeet Singh

On Tue, Jul 7, 2015 at 4:59 AM, Andres Freund  wrote:

> On 2015-06-10 13:13:41 -0700, Gurjeet Singh wrote:
> > + /*
> > +  * Log an xid snapshot for logical replication.
> It's not needed for
> > +  * physical slots as it is done in BGWriter on a
> regular basis.
> > +  */
> > + if (!slot->data.persistency == RS_PERSISTENT)
> > + {
> > + /* make sure we have enough information to
> start */
> > + flushptr = LogStandbySnapshot();
> > +
> > + /* and make sure it's fsynced to disk */
> > + XLogFlush(flushptr);
> > + }
>
> Huh? The slot->data.persistency == RS_PERSISTENT check seems pretty much
> entirely random to me.


There seems to be a misplaced not operator  ! in that if statement, as
well. That sucks :( The MacOS gcc binary is actually clang, and its output
is too noisy [1], which is probably why I might have missed warning if any.

[1]: I am particularly annoyed by these:

note: remove extraneous parentheses around the comparison to silence this
warning
note: use '=' to turn this equality comparison into an assignment

Eg. : if (((opaque)->btpo_next == 0))

I'll see what I can do about these.


> I mean physical slots can (and normally are)
> persistent as well?  Instead check whether it's a database specifics lot.
>

Agreed, the slot being database-specific is the right indicator.


>
> The reasoning why it's not helpful for physical slots is wrong. The
> point is that a standby snapshot at that location will simply not help
> physical replication, it's only going to read ones after the location at
> which the base backup starts (i.e. the location from the backup label).
>
> >  pg_create_physical_replication_slot(PG_FUNCTION_ARGS)
> >  {
> >   Namename = PG_GETARG_NAME(0);
> > + boolactivate = PG_GETARG_BOOL(1);
>
> Don't like 'activate' much as a name. How about 'immediately_reserve'?
>

I still like 'activate, but okay. How about 'reserve_immediately' instead?

Also, do you want this name change just in the C code, or in the pg_proc
and docs as well?


>
> Other than that it's looking good to me.
>

Will send a new patch after your feedback on the 'activate' renaming.

Best regards,
-- 
Gurjeet Singh http://gurjeet.singh.im/

Re: [HACKERS] More logging for autovacuum

2015-07-01 Thread Gurjeet Singh

On Fri, Aug 17, 2007 at 3:14 PM, Alvaro Herrera 
wrote:

> Gregory Stark wrote:
> >
> > I'm having trouble following what's going on with autovacuum and I'm
> finding
> > the existing logging insufficient. In particular that it's only logging
> vacuum
> > runs *after* the vacuum finishes makes it hard to see what vacuums are
> running
> > at any given time. Also, I want to see what is making autovacuum decide
> to
> > forgo vacuuming a table and the log with that information is at DEBUG2.
>
> So did this idea go anywhere?
>

Assuming the thread stopped here, I'd like to rekindle the proposal.

log_min_messages acts as a single gate for everything headed for the server
logs; controls for per-background process logging do not exist. If one
wants to see DEBUG/INFO messages for just the Autovacuum (or checkpointer,
bgwriter, etc.), they have to set log_min_messages to that level, but the
result would be a lot of clutter from other processes to grovel through, to
see the messages of interest.

The facilities don't yet exist, but it'd be nice if such parameters when
unset (ie NULL) pick up the value of log_min_messages. So by default, the
users will get the same behaviour as today, but can choose to tweak per
background-process logging when needed.

Absent such a feature, one hack is to set the desired log_min_messages
value in conf file and send SIGHUP to just the process of interest and
revert the conf file back; but it's a hack.

Best regards,
-- 
Gurjeet Singh http://gurjeet.singh.im/

Re: [HACKERS] drop/truncate table sucks for large values of shared buffers

2015-06-27 Thread Gurjeet Singh

On Fri, Jun 26, 2015 at 9:45 PM, Amit Kapila 
wrote:

> Sometime back on one of the PostgreSQL blog [1], there was
> discussion about the performance of drop/truncate table for
> large values of shared_buffers and it seems that as the value
> of shared_buffers increase the performance of drop/truncate
> table becomes worse.  I think those are not often used operations,
> so it never became priority to look into improving them if possible.
>
> I have looked into it and found that the main reason for such
> a behaviour is that for those operations it traverses whole
> shared_buffers and it seems to me that we don't need that
> especially for not-so-big tables.  We can optimize that path
> by looking into buff mapping table for the pages that exist in
> shared_buffers for the case when table size is less than some
> threshold (say 25%) of shared buffers.
>
> Attached patch implements the above idea and I found that
> performance doesn't dip much with patch even with large value
> of shared_buffers.  I have also attached script and sql file used
> to take performance data.
>

+1 for the effort to improve this.

With your technique added, there are 3 possible ways the search can happen
a) Scan NBuffers and scan list of relations, b) Scan NBuffers and bsearch
list of relations, and c) Scan list of relations and then invalidate blocks
of each fork from shared buffers. Would it be worth it finding one
technique that can serve decently from the low-end shared_buffers to the
high-end.

On patch:

There are multiple naming styles being used in DropForkSpecificBuffers();
my_name and myName. Given this is a new function, it'd help to be
consistent.

s/blk_count/blockNum/

s/new//, for eg. newTag, because there's no corresponding tag/oldTag
variable in the function.

s/blocksToDel/blocksToDrop/. BTW, we never pass anything other than the
total number of blocks in the fork, so we may as well call it just
numBlocks.

s/traverse_buf_freelist/scan_shared_buffers/, because when it is true, we
scan the whole shared_buffers.

s/rel_count/rel_num/

Reduce indentation/tab in header-comments of DropForkSpecificBuffers(). But
I see there's precedent in neighboring functions, so this may be okay.

Doing pfree() of num_blocks, num_fsm_blocks and num_vm_blocks in one place
(instead of two, at different indentation levels) would help readability.

Best regards,
-- 
Gurjeet Singh http://gurjeet.singh.im/

Re: [HACKERS] [PATCH] Function to get size of asynchronous notification queue

2015-06-25 Thread Gurjeet Singh

Patch reviewed following the instructions on
https://wiki.postgresql.org/wiki/Reviewing_a_Patch

# Submission review
- Is the patch in a patch format which has context?
Yes. Note to other reviewers: `git diff —patience` yields patch better
suited for readability

- Does it apply cleanly to the current git master?
Yes.

- Does it include reasonable tests, necessary doc patches, etc?
Doc patch - Yes.
Tests - Yes.

# Usability review
- Does the patch actually implement the feature?
Yes.

- Do we want that?
Yes; see the discussion in mailing list.

- Do we already have it?
No.

- Does it follow SQL spec, or the community-agreed behavior?
Yes. It seems to implement the behavior agreed upon in the mailing list.

- Does it include pg_dump support (if applicable)?
N/A

- Are there dangers?
None that I could spot.

- Have all the bases been covered?
There’s room for an additional test which tests for non-zero return value.

# Feature test
- Does the feature work as advertised?
Yes. Build configured with '--enable-debug --enable-cassert CFLAGS=-O0’.
With a slightly aggressive notifications-in-a-loop script I was able to see
non-zero return value:

Session 1:
listen ggg;
begin;

Session 2:
while sleep 0.1; do echo 'notify ggg; select
pg_notification_queue_usage();' ; done | psql

- Are there corner cases the author has failed to consider?
No.

- Are there any assertion failures or crashes?
The patch exposes an already available function to the SQL interface, and
rearranges code, so it doesn’t look like the patch can induce an assertion
or a crash.

- Performance review
Not evaluated, since it’s not a performance patch, and it doesn’t seem to
impact any performance critical code path, ,either.


Additional notes:

Patch updates the docs of another function (pg_listening_channels), but the
update is an improvement so we can let it slide :)

+   proportion of the queue that is currently occupied by pending
notifications.

s/proportion/fraction/

+ * The caller must hold (at least) shared AysncQueueLock.

A possibly better wording: The caller must hold AysncQueueLock in (at
least) shared mode.

Unnecessary whitespace changes in pg_proc.h for existing functions.

+DESCR("get the current usage of the asynchronous notification queue");

A possibly better wording: get the fraction of the asynchronous
notification queue currently in use


On Thu, Jun 18, 2015 at 10:47 AM, Merlin Moncure  wrote:

> On Wed, Jun 17, 2015 at 6:15 PM, Brendan Jurd  wrote:
> > Posting v2 of the patch, incorporating some helpful suggestions from
> Merlin.
>
> Based on perfunctory scan of the code, I think this is gonna make it,
> unless you draw some objections based on lack of necessity.
>
> merlin
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>



-- 
Gurjeet Singh http://gurjeet.singh.im/

Re: [HACKERS] [PATCH] Function to get size of asynchronous notification queue

2015-06-17 Thread Gurjeet Singh

I don't see this in the CF app; can you please add it there?

Best regards,

On Wed, Jun 17, 2015 at 3:31 AM, Brendan Jurd  wrote:

> Hello hackers,
>
> I present a patch to add a new built-in function
> pg_notify_queue_saturation().
>
> The purpose of the function is to allow users to monitor the health of
> their notification queue.  In certain cases, a client connection listening
> for notifications might get stuck inside a transaction, and this would
> cause the queue to keep filling up, until finally it reaches capacity and
> further attempts to NOTIFY error out.
>
> The current documentation under LISTEN explains this possible gotcha, but
> doesn't really suggest a useful way to address it, except to mention that
> warnings will show up in the log once you get to 50% saturation of the
> queue.  Unless you happen to be eyeballing the logs when it happens, that's
> not a huge help.  The choice of 50% as a threshold is also very much
> arbitrary, and by the time you hit 50% the problem has likely been going on
> for quite a while.  If you want your nagios (or whatever) to say, alert you
> when the queue goes over 5% or 1%, your options are limited and awkward.
>
> The patch has almost no new code.  It makes use of the existing logic for
> the 50% warning.  I simply refactored that logic into a separate function
> asyncQueueSaturation, and then added pg_notify_queue_saturation to make
> that available in SQL.
>
> I am not convinced that pg_notify_queue_saturation is the best possible
> name for this function, and am very much open to other suggestions.
>
> The patch includes documentation, a regression test and an isolation test.
>
> Cheers,
> BJ
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>


-- 
Gurjeet Singh http://gurjeet.singh.im/

Re: [HACKERS] replication slot restart_lsn initialization

2015-06-10 Thread Gurjeet Singh

On Wed, Jun 10, 2015 at 9:10 AM, Gurjeet Singh  wrote:

>
> I am in the process of writing up a doc patch, and will submit that as
> well in a short while.
>

Please find attached the patch with the doc update.

Best regards,
-- 
Gurjeet Singh http://gurjeet.singh.im/


physical_repl_slot_activate_restart_lsn.v3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] replication slot restart_lsn initialization

2015-06-10 Thread Gurjeet Singh

On Wed, Jun 10, 2015 at 8:36 AM, Andres Freund  wrote:

> On 2015-06-10 08:24:23 -0700, Gurjeet Singh wrote:
> > On Wed, Jun 10, 2015 at 8:07 AM, Andres Freund 
> wrote:
> > > That doesn't look right to me. Why is this code logging a standby
> > > snapshot for physical slots?
> > >
> >
> > This is the new function I referred to above. The logging of the snapshot
> > is in 'not RecoveryInProgress()' case, meaning it's running in primary
> and
> > not in a standby.
>
> It's still done uselessly for physical slots?
>

I missed the comments on LogStandbySnapshot(). Attached is the new patch
which does the snapshot logging only for a logical replication slot.

I am in the process of writing up a doc patch, and will submit that as well
in a short while.

I have removed a few #include statements which seemed unnecessary. These
changes are not relevant to the patch, so I can readily revert those parts
if deemed necessary.

Best regards,
-- 
Gurjeet Singh http://gurjeet.singh.im/

physical_repl_slot_activate_restart_lsn.v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] replication slot restart_lsn initialization

2015-06-10 Thread Gurjeet Singh

On Wed, Jun 10, 2015 at 8:07 AM, Andres Freund  wrote:

> On 2015-06-10 08:00:28 -0700, Gurjeet Singh wrote:
>
> > pg_create_logical_replication_slot() prevents LSN from being
> > recycled that by looping (worst case 2 times) until there's no
> > conflict with the checkpointer recycling the segment. So I have used
> > the same approach.
>
> There's no need to change anything for logical slots? Or do you just
> mean that you moved the existing code?
>

Yes, I turned the code from logical replication into a function and used it
from logical and physical replication.


> >
> >  /*
> > + * Grab and save an LSN value to prevent WAL recycling past that point.
> > + */
> > +void
> > +ReplicationSlotRegisterRestartLSN()
> > +{

...

> > + /*
> > +  * Let's start with enough information if we can, so log a
> standby
> > +  * snapshot and start decoding at exactly that position.
> > +  */
> > + if (!RecoveryInProgress())
> > + {
> > + XLogRecPtr  flushptr;
> > +
> > + /* start at current insert position */
> > + slot->data.restart_lsn = GetXLogInsertRecPtr();
> > +
> > + /* make sure we have enough information to start */
> > + flushptr = LogStandbySnapshot();
> > +
> > + /* and make sure it's fsynced to disk */
> > + XLogFlush(flushptr);
> > + }
> > + else
> > + slot->data.restart_lsn = GetRedoRecPtr();
> > +
>


> That doesn't look right to me. Why is this code logging a standby
> snapshot for physical slots?
>

This is the new function I referred to above. The logging of the snapshot
is in 'not RecoveryInProgress()' case, meaning it's running in primary and
not in a standby.

Best regards,
-- 
Gurjeet Singh http://gurjeet.singh.im/

Re: [HACKERS] replication slot restart_lsn initialization

2015-06-10 Thread Gurjeet Singh

On Tue, May 5, 2015 at 5:53 PM, Andres Freund  wrote:

>
> > Was there any consideration for initializing restart_lsn to the latest
> > WAL write pointer when a slot is created? Or for allowing an optional
> > parameter in pg_create_(physical|logical)_replication_slot() for
> > specifying the restart_lsn at slot creation?
>
> I've been wondering about allowing for the latter alternative. I could
> have used it a couple times. The former doesn't make much sense to me,
> it could be too far *ahead* in many cases actually.  A patch for this
> would be fairly trivial.
>

Attached is the patch that takes the former approach (initialize
restart_lsn when the slot is created). I think it's better than the latter
approach (depend on user to specify an LSN) because the LSN user specifies
may have already been recycled. pg_create_logical_replication_slot()
prevents LSN from being recycled that by looping (worst case 2 times) until
there's no conflict with the checkpointer recycling the segment. So I have
used the same approach.

The function pg_create_physical_replication_slot() now has an additional
boolean parameter 'activate' which user can use to allocate restart_lsn as
part of the creation process.

Best regards,
-- 
Gurjeet Singh http://gurjeet.singh.im/

physical_repl_slot_activate_restart_lsn.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Interval arithmetic should emit interval in canonical format

2014-07-15 Thread Gurjeet Singh

It's hard to argue that the current behaviour is wrong, but it's worth a try.

First I'd appreciate the "official" reasons why Postgres prefers to
keep interval values in non-canonical form, like '1 day -23:37:00'
instead of '00:23:00'. I understand it has something to do with a
year/month/day not being exactly 365-days/30-days/24-hours, and/or
operations involving interval and 'timestamp with time zone'. But
since it's not explicitly spelled out in docs or in code (at least I
didn't find it in the obvious places), seeking explanation here. I
understand that the answers may obviate any change in behaviour I am
requesting below.

The interval arithmetic operations may also yield non-canonical
values, and IMHO the 'interval op interval' or 'interval op scalar'
expressions should yield an interval in canonical form. For eg.

postgres=# select '6 days 00:16:00'::interval - '5 days
23:53:00'::interval as result;
 result
-
 1 day -23:37:00

postgres=# select '6 days 00:16:00'::interval + '5 days
23:53:00'::interval as result;
  result
--
 11 days 24:09:00

I cannot think of a use case where the above results are any better
than emitting '00:23:00' and '12 days 00:09:00', respectively.

We may not be able to turn every interval datum into canonical form,
but at least the intervals produced as a result of interval operators
can be converted to canonical form to reduce surprises for users. I
may even go as far as proposing rounding up 24-hours into a day, but
not round up days into months or months into years.

I was surprised by the presence of non-canonical form of interval in a
sorted-by-interval result set. The intervals were computed within the
query, using 'timestamp without time zone' values in a table.

# select ...

 result

...
00:23:00
00:23:00
1 day -23:37:00
00:23:00
00:22:00
...

The ordering above demonstrates that Postgres _does_ consider '1 day
-23:37:00' == '00:23:00', then it seems pointless to confuse the user
by showing two different representations of the same datum. This also
increases the code complexity required in applications/ORMs to parse
interval data's text representation.

Best regards,
-- 
Gurjeet Singh http://gurjeet.singh.im/

EDB : www.EnterpriseDB.com : The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: Patch to send transaction commit/rollback stats to the stats collector unconditionally.

2014-07-02 Thread Gurjeet Singh

On Wed, Jul 2, 2014 at 3:49 PM, Kevin Grittner  wrote:
> In preparing to push the patch, I noticed I hadn't responded to this:
>
> Gurjeet Singh  wrote:
>> Kevin Grittner  wrote:
>>> I have reviewed this patch, and think we should do what the patch
>>> is trying to do, but I don't think the submitted patch would
>>> actually work.
>>
>> Just curious, why do you think it won't work.
>
> Because you didn't touch this part of the function:
>
> /*
>  * Send partial messages.  If force is true, make sure that any pending
>  * xact commit/abort gets counted, even if no table stats to send.
>  */
> if (regular_msg.m_nentries > 0 ||
> (force && (pgStatXactCommit > 0 || pgStatXactRollback > 0)))
> pgstat_send_tabstat(®ular_msg);
>
> The statistics would not actually be sent unless a table had been
> accessed or it was forced because the connection was closing.

I sure did! In fact that was the one and only line of code that was
changed. It effectively bypassed the 'force' check if there were any
transaction stats to report.

 /*
- * Send partial messages.  If force is true, make sure that any pending
- * xact commit/abort gets counted, even if no table stats to send.
+ * Send partial messages.  Make sure that any pending xact
commit/abort gets
+ * counted, even if no table stats to send.
  */
 if (regular_msg.m_nentries > 0 ||
-(force && (pgStatXactCommit > 0 || pgStatXactRollback > 0)))
+force || (pgStatXactCommit > 0 || pgStatXactRollback > 0))
 pgstat_send_tabstat(®ular_msg);
 if (shared_msg.m_nentries > 0)
 pgstat_send_tabstat(&shared_msg);

Best regards,
-- 
Gurjeet Singh http://gurjeet.singh.im/

EDB : www.EnterpriseDB.com : The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposing pg_hibernate

2014-07-01 Thread Gurjeet Singh

On Sat, Jun 7, 2014 at 6:48 AM, Cédric Villemain  wrote:
> Le lundi 3 février 2014 19:18:54 Gurjeet Singh a écrit :
>
>> Possible enhancements:
>> - Ability to save/restore only specific databases.
>> - Control how many BlockReaders are active at a time; to avoid I/O
>> storms.

FWIW, this has been implemented. By default, only one database is
restored at a time.

>>- Be smart about lowered shared_buffers across the restart.
>> - Different modes of reading like pg_prewarm does.
>> - Include PgFincore functionality, at least for Linux platforms.
>
> Please note that pgfincore is working on any system where PostgreSQL
> prefetch is working, exactly like pg_prewarm. This includes linux, BSD and
> many unix-like. It *is not* limited to linux.
>
> I never had a single request for windows, but windows does provides an API
> for that too (however I have no windows offhand to test).
>
> Another side note is that currently BSD (at least freeBSD) have a more
> advanced mincore() syscall than linux and offers a better analysis (dirty
> status is known) and they implemented posix_fadvise...

I have never used pgfincore, and have analyzed it solely based on the
examples provided, second-hand info, and some code reading, so the
following may be wrong; feel free to correct.

The UI of pgfincore suggests that to save a snapshot of an object,
pgfincore reads all the segments of the object and queries the OS
cache. This may take a lot of time on big databases. If this is done
at shutdown time, the time to finish shutdown will be proportional to
the size of the database, rather than being proportional to the amount
of data files in OS cache.

>
> There is a previous thread about that hibernation feature. Mitsuru IWASAKI
> did a patch, and it triggers some interesting discussions.
>
> Some notes in this thread are outdated now, but it's worth having a look at
> it:
>
> http://www.postgresql.org/message-id/20110504.231048.113741617.iwas...@jp.freebsd.org
>
> https://commitfest.postgresql.org/action/patch_view?id=549

Thanks for sharing these. I agree with Greg's observations there that
the shared-buffers are becoming increasingly smaller subset of the RAM
available on modern machines. But until it can be done in a
platform-independent way I doubt it will ever be accepted in Postgres.
Even when it's accepted, it would have to be off by default because of
the slow shutdown mentioned above.

Best regards,
-- 
Gurjeet Singh http://gurjeet.singh.im/

EDB : www.EnterpriseDB.com : The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposing pg_hibernate

2014-07-01 Thread Gurjeet Singh

On Sun, Jun 15, 2014 at 2:51 AM, Amit Kapila  wrote:
> On Thu, Jun 12, 2014 at 9:31 AM, Gurjeet Singh  wrote:
>>
>> I don't have intimate knowledge of recovery but I think the above
>> assessment of recovery's operations holds true. If you still think
>> this is a concern, can you please provide a bit firm example using
>> which I can visualize the problem you're talking about.
>
> Okay, let me try with an example:
>
> Assume No. of shared buffers = 5
> Before Crash:
>
> 1. Pages in shared buffers numbered 3, 4, 5 have write operations
> performed on them, followed by lot of reads which makes their
> usage_count as 5.
> 2. Write operation happened on pages in shared buffers numbered
> 1, 2. Usage_count of these buffers is 1.
> 3. Now one read operation needs some different pages and evict
> pages in shared buffers numbered 1 and 2 and read the required
> pages, so buffers 1 and 2 will have usage count as 1.
> 4. At this moment shutdown initiated.
> 5. Bgwriter saved just buffers 1 and 2 and crashed.
>
> After Crash:
>
> 1. Recovery will read in pages on which operations happened in
> step-1 and 2 above.
> 2. Buffer loader (pg_hibernator) will load buffers on which operations
> happened in step-3, so here it might needs to evict buffers which are
> corresponding to buffers of step-1 before crash.  So what this
> essentially means is that pg_hibernator can lead to eviction of more
> useful pages.

Granted, you have demonstrated that the blocks restored by
pg_hibernator can cause eviction of loaded-by-recovery blocks. But,
one can argue that pg_hibernator brought the shared-buffer contents to
to a state that is much closer to the pre-shutdown state than the
recovery would have restored them to. I think this supports the case
for pg_hibernator, that is, it is doing what it is supposed to do:
restore shared-buffers to pre-shutdown state.

I agree that there's uncertainty as to which buffers will be cleared,
and hence which blocks will be evicted. So pg_hibernator may cause
eviction of blocks that had higher usage count before the shutdown,
because they may have a lower/same usage count as other blocks'
buffers after recovery. There's not much that can be done for that,
because usage count information is not saved anywhere on disk, and I
don't think it's worth saving just for pg_hibernator's sake.

Best regards,
-- 
Gurjeet Singh http://gurjeet.singh.im/

EDB : www.EnterpriseDB.com : The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: Patch to send transaction commit/rollback stats to the stats collector unconditionally.

2014-07-01 Thread Gurjeet Singh

On Tue, Jul 1, 2014 at 10:05 AM, Kevin Grittner  wrote:
> Tom Lane  wrote:
>
>> If we're going to do it like this, then I think the force flag
>> should be considered to do nothing except override the clock
>> check, which probably means it shouldn't be tested in the initial
>> if() at all.
>
> That makes sense, and is easily done.

Attached is the patch to save you a few key strokes :)

> The only question left is
> how far back to take the patch.  I'm inclined to only apply it to
> master and 9.4.  Does anyone think otherwise?

Considering this as a bug-fix, I'd vote for it to be applied to all
supported releases. But since this may cause unforeseen performance
penalty, I think it should be applied only as far back as the
introduction of PGSTAT_STAT_INTERVAL throttle.

The throttle was implemeted in 641912b, which AFAICT was part of 8.3.
So I guess all the supported releases it is.

Best regards,
-- 
Gurjeet Singh http://gurjeet.singh.im/

EDB : www.EnterpriseDB.com :  The Enterprise PostgreSQL Company
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 3ab1428..c7f41a5 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -753,7 +753,8 @@ pgstat_report_stat(bool force)
 
/* Don't expend a clock check if nothing to do */
if ((pgStatTabList == NULL || pgStatTabList->tsa_used == 0) &&
-   !have_function_stats && !force)
+   pgStatXactCommit == 0 && pgStatXactRollback == 0 &&
+   !have_function_stats)
return;
 
/*
@@ -817,11 +818,11 @@ pgstat_report_stat(bool force)
}
 
/*
-* Send partial messages.  If force is true, make sure that any pending
-* xact commit/abort gets counted, even if no table stats to send.
+* Send partial messages.  Make sure that any pending xact commit/abort
+* gets counted, even if there are no table stats to send.
 */
if (regular_msg.m_nentries > 0 ||
-   (force && (pgStatXactCommit > 0 || pgStatXactRollback > 0)))
+   pgStatXactCommit > 0 || pgStatXactRollback > 0)
pgstat_send_tabstat(®ular_msg);
if (shared_msg.m_nentries > 0)
pgstat_send_tabstat(&shared_msg);

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: Patch to send transaction commit/rollback stats to the stats collector unconditionally.

2014-07-01 Thread Gurjeet Singh

On Sun, Jun 29, 2014 at 11:58 AM, Kevin Grittner  wrote:
> I have reviewed this patch, and think we should do what the patch
> is trying to do, but I don't think the submitted patch would
> actually work.

Just curious, why do you think it won't work. Although the discussion
is a bit old, but I'm sure I would've tested the patch before
submitting.

> I have attached a suggested patch which I think
> would work.  Gurjeet, could you take a look at it?

The patch, when considered together with Tom's suggestion upthread,
looks good to me.

Best regards,
-- 
Gurjeet Singh http://gurjeet.singh.im/

EDB : www.EnterpriseDB.com : The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] /proc/self/oom_adj is deprecated in newer Linux kernels

2014-06-23 Thread Gurjeet Singh

On Mon, Jun 23, 2014 at 12:49 PM, Tom Lane  wrote:
> Gurjeet Singh  writes:
>> While I'd love to reduce the number of future installations without
>> this fix in place, I respect the decision to honor project policy. At
>> the same time, this change does not break anything. It introduces new
>> environment variables which change the behaviour, but behaves the old
>> way in the absence of those variables.
>
> Uh, no, it doesn't.  We removed the dependence on -DLINUX_OOM_SCORE_ADJ.
> If a packager is expecting that to still work in 9.4, he's going to be
> unpleasantly surprised, because the system will silently fail to do what
> he's expecting: it will run all the backend processes at no-OOM-kill
> priority, which is likely to be bad.

True. I didn't think from a packager's perspective.

> It is possible to make packages that will work either way, along the lines
> of the advice I sent to the Red Hat guys:
> https://bugzilla.redhat.com/show_bug.cgi?id=1110969
>
> but I think it's a bit late in the cycle to be telling packagers to do
> that for 9.4.

Understandable.

>> BTW, does the project publish the feature-freeze deadlines and other
>> dates somewhere (apart from on this list).
>
> It's usually in the dev meeting minutes
> https://wiki.postgresql.org/wiki/PgCon_2014_Developer_Meeting#9.5_Schedule

Thanks. But it doesn't spell out the specific dates, or even the month
of feature-freeze.

Best regards,
-- 
Gurjeet Singh http://gurjeet.singh.im/

EDB www.EnterpriseDB.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] /proc/self/oom_adj is deprecated in newer Linux kernels

2014-06-23 Thread Gurjeet Singh

On Mon, Jun 23, 2014 at 11:52 AM, Tom Lane  wrote:
> Gurjeet Singh  writes:
>> would it be possible to include this in 9.4 as well?
>
> While this is clearly an improvement over what we had before, it's
> impossible to argue that it's a bug fix, and we are way past the 9.4
> feature freeze deadline.  In particular, packagers who've already done
> their 9.4 development work might be blindsided by us slipping this into
> 9.4 release.  So while I wouldn't have a problem with putting this change
> into 9.4 from a technical standpoint, it's hard to argue that it'd meet
> project norms from a development-process standpoint.

While I'd love to reduce the number of future installations without
this fix in place, I respect the decision to honor project policy. At
the same time, this change does not break anything. It introduces new
environment variables which change the behaviour, but behaves the old
way in the absence of those variables. So I guess it's not changing
the default behaviour in incompatible way.

BTW, does the project publish the feature-freeze deadlines and other
dates somewhere (apart from on this list). I was looking the other day
and didn't find any reference. Either the commitfest app or the
'Developer' section of the wiki [1] seem to be good candidates for
this kind of information.

[1]: https://wiki.postgresql.org/wiki/Development_information

Best regards,
-- 
Gurjeet Singh http://gurjeet.singh.im/

EDB www.EnterpriseDB.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] /proc/self/oom_adj is deprecated in newer Linux kernels

2014-06-23 Thread Gurjeet Singh

On Wed, Jun 18, 2014 at 8:15 PM, Tom Lane  wrote:
> Gurjeet Singh  writes:
>
>> Please find attached the patch. It includes the doc changes as well.
>
> Applied with some editorialization.

Thanks!

would it be possible to include this in 9.4 as well?

Best regards,
-- 
Gurjeet Singh http://gurjeet.singh.im/

EDB www.EnterpriseDB.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] review: Non-recursive processing of AND/OR lists

2014-06-23 Thread Gurjeet Singh

Thanks!

On Mon, Jun 16, 2014 at 3:58 PM, Tom Lane  wrote:
> I wrote:
>> Gurjeet Singh  writes:
>>> I tried to eliminate the 'pending' list, but I don't see a way around it.
>>> We need temporary storage somewhere to store the branches encountered on
>>> the right; in recursion case the call stack was serving that purpose.
>
>> I still think we should fix this in the grammar, rather than introducing
>> complicated logic to try to get rid of the recursion later.  For example,
>> as attached.
>
> I went looking for (and found) some additional obsoleted comments, and
> convinced myself that ruleutils.c is okay as-is, and pushed this.
>
> regards, tom lane



-- 
Gurjeet Singh http://gurjeet.singh.im/

EDB www.EnterpriseDB.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] /proc/self/oom_adj is deprecated in newer Linux kernels

2014-06-12 Thread Gurjeet Singh

On Tue, Jun 10, 2014 at 12:05 PM, Tom Lane  wrote:
> If we're going to do this, I would say that we should also take the
> opportunity to get out from under the question of which kernel API
> we're dealing with.  So perhaps a design like this:
>
> 1. If the environment variable PG_OOM_ADJUST_FILE is defined, it's
> the name of a file to write something into after forking.  The typical
> value would be "/proc/self/oom_score_adj" or "/proc/self/oom_adj".
> If not set, nothing happens.
>
> 2. If the environment variable PG_OOM_ADJUST_VALUE is defined, that's
> the string we write, otherwise we write "0".

Please find attached the patch. It includes the doc changes as well.

Best regards,
-- 
Gurjeet Singh http://gurjeet.singh.im/

EDB www.EnterpriseDB.com
diff --git a/doc/src/sgml/runtime.sgml b/doc/src/sgml/runtime.sgml
index 9fadef5..4492a1d 100644
--- a/doc/src/sgml/runtime.sgml
+++ b/doc/src/sgml/runtime.sgml
@@ -1284,8 +1284,15 @@ echo -1000 > /proc/self/oom_score_adj
 in the postmaster's startup script just before invoking the postmaster.
 Note that this action must be done as root, or it will have no effect;
 so a root-owned startup script is the easiest place to do it.  If you
-do this, you may also wish to build PostgreSQL
-with -DLINUX_OOM_SCORE_ADJ=0 added to CPPFLAGS.
+do this, you may also wish to export these environment variables
+
+PG_OOM_ADJUST_FILE=oom_score_adj
+PG_OOM_ADJUST_VALUE=0
+
+export PG_OOM_ADJUST_FILE
+export PG_OOM_ADJUST_VALUE
+
+in the startup script, before invoking the postmaster.
 That will cause postmaster child processes to run with the normal
 oom_score_adj value of zero, so that the OOM killer can still
 target them at need.
@@ -1296,8 +1303,7 @@ echo -1000 > /proc/self/oom_score_adj
 but may have a previous version of the same functionality called
 /proc/self/oom_adj.  This works the same except the disable
 value is -17 not -1000.  The corresponding
-build flag for PostgreSQL is
--DLINUX_OOM_ADJ=0.
+value for PG_OOM_ADJUST_FILE should be oom_adj.

 

diff --git a/src/backend/postmaster/fork_process.c b/src/backend/postmaster/fork_process.c
index f6df2de..21559af 100644
--- a/src/backend/postmaster/fork_process.c
+++ b/src/backend/postmaster/fork_process.c
@@ -22,6 +22,13 @@
 #endif
 
 #ifndef WIN32
+
+#ifdef __linux__
+static bool	oom_env_checked = false;
+static char	oom_adj_file[MAXPGPATH];
+static int	oom_adj_value = 0;
+#endif	/* __linux__ */
+
 /*
  * Wrapper for fork(). Return values are the same as those for fork():
  * -1 if the fork failed, 0 in the child process, and the PID of the
@@ -78,13 +85,38 @@ fork_process(void)
 		 * LINUX_OOM_SCORE_ADJ #defined to 0, or to some other value that you
 		 * want child processes to adopt here.
 		 */
-#ifdef LINUX_OOM_SCORE_ADJ
+#ifdef __linux__
+		if (!oom_env_checked)
+		{
+			char *env;
+
+			oom_env_checked = true;
+
+			env = getenv("PG_OOM_ADJUST_FILE");
+
+			/* Don't allow a path separator in file name */
+			if (env && !strchr(env, '/') && !strchr(env, '\\'))
+			{
+snprintf(oom_adj_file, MAXPGPATH, "/proc/self/%s", env);
+
+env = getenv("PG_OOM_ADJUST_VALUE");
+if (env)
+{
+	oom_adj_value = atoi(env);
+}
+else
+	oom_adj_value = 0;
+			}
+			else
+oom_adj_file[0] = '\0';
+		}
+
 		{
 			/*
 			 * Use open() not stdio, to ensure we control the open flags. Some
 			 * Linux security environments reject anything but O_WRONLY.
 			 */
-			int			fd = open("/proc/self/oom_score_adj", O_WRONLY, 0);
+			int			fd = open(oom_adj_file, O_WRONLY, 0);
 
 			/* We ignore all errors */
 			if (fd >= 0)
@@ -92,41 +124,14 @@ fork_process(void)
 char		buf[16];
 int			rc;
 
-snprintf(buf, sizeof(buf), "%d\n", LINUX_OOM_SCORE_ADJ);
+snprintf(buf, sizeof(buf), "%d\n", oom_adj_value);
 rc = write(fd, buf, strlen(buf));
 (void) rc;
 close(fd);
 			}
 		}
-#endif   /* LINUX_OOM_SCORE_ADJ */
 
-		/*
-		 * Older Linux kernels have oom_adj not oom_score_adj.  This works
-		 * similarly except with a different scale of adjustment values. If
-		 * it's necessary to build Postgres to work with either API, you can
-		 * define both LINUX_OOM_SCORE_ADJ and LINUX_OOM_ADJ.
-		 */
-#ifdef LINUX_OOM_ADJ
-		{
-			/*
-			 * Use open() not stdio, to ensure we control the open flags. Some
-			 * Linux security environments reject anything but O_WRONLY.
-			 */
-			int			fd = open("/proc/self/oom_adj", O_WRONLY, 0);
-
-			/* We ignore all errors */
-			if (fd >= 0)
-			{
-char		buf[16];
-int			rc;
-
-snprintf(buf, sizeof(buf), "%d\n", LINUX_OOM_ADJ);
-rc = write(fd, buf, strlen(buf));
-(void) rc;
-close(fd);
-			}
-		}
-#endif   /* LINUX_OOM_ADJ */
+#endif

Re: [HACKERS] Proposing pg_hibernate

2014-06-11 Thread Gurjeet Singh

On Wed, Jun 11, 2014 at 10:56 AM, Robert Haas  wrote:
> On Tue, Jun 10, 2014 at 10:03 PM, Gurjeet Singh  wrote:
>> And it's probably accepted by now that such a bahviour is not
>> catastrophic, merely inconvenient.
>
> I think the whole argument for having pg_hibernator is that getting
> the block cache properly initialized is important.  If it's not
> important, then we don't need pg_hibernator in the first place.  But
> if it is important, then I think not loading unrelated blocks into
> shared_buffers is also important.

I was constructing a contrived scenario, something that would rarely
happen in reality. I feel that the benefits of this feature greatly
outweigh the minor performance loss caused in such an unlikely scenario.

Best regards,
-- 
Gurjeet Singh http://gurjeet.singh.im/

EDB www.EnterpriseDB.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposing pg_hibernate

2014-06-11 Thread Gurjeet Singh

On Wed, Jun 11, 2014 at 12:25 AM, Amit Kapila  wrote:
> On Wed, Jun 11, 2014 at 7:59 AM, Gurjeet Singh  wrote:
>> On Sun, Jun 8, 2014 at 3:24 AM, Amit Kapila 
>> wrote:
>> > Yeap, but if it crashes before writing checkpoint record, it will lead
>> > to
>> > recovery which is what we are considering.
>>
>> Good point.
>>
>> In case of such recovery, the recovery process will read in the blocks
>> that were recently modified, and were possibly still in shared-buffers
>> when Checkpointer crashed. So after recovery finishes, the
>> BlockReaders will be invoked (because save-files were successfully
>> written before the crash), and they would request the same blocks to
>> be restored. Most likely, those blocks would already be in
>> shared-buffers, hence no cause of concern regarding BlockReaders
>> evicting buffers populated by recovery.
>
> Not necessarily because after crash, recovery has to start from
> previous checkpoint, so it might not perform operations on same
> pages as are saved by buffer saver.

Granted, the recovery may not start that way (that is, reading in
blocks that were in shared-buffers when shutdown was initiated), but
it sure would end that way. Towards the end of recovery, the blocks
it'd read back in are highly likely to be the ones that were present
in shared-buffers at the time of shutdown. By the end of recovery,
either (a) blocks read in at the beginning of recovery are evicted by
later operations of recovery, or (b) they are still present in
shared-buffers. So the blocks requested by the BlockReaders are highly
likely to be already in shared-buffers at the end of recovery, because
these are the same blocks that were dirty (and hence recorded in WAL)
just before shutdown time.

I guess what I am trying to say is that the blocks read in by the
BlockReaders will be a superset of those read in by the reocvery
process. At the time of shutdown/saving-buffers, the shared-buffers
may have contained dirty and clean buffers. WAL contains the info of
which blocks were dirtied. Recovery will read back the blocks that
were dirty, to replay the WAL, and since the BlockReaders are started
_after_ recovery finishes, the BlockReaders will effectively read in
only those blocks that are not already read-in by the recovery.

I am not yet convinced, at least in this case, that Postgres
Hibernator would restore blocks that can cause eviction of buffers
restored by recovery.

I don't have intimate knowledge of recovery but I think the above
assessment of recovery's operations holds true. If you still think
this is a concern, can you please provide a bit firm example using
which I can visualize the problem you're talking about.

> Also as the file saved by
> buffer saver can be a file which contains only partial list of
> buffers which were in shared buffer's, it becomes more likely that
> in such cases it can override the buffers populated by recovery.

I beg to differ. As described above, the blocks read-in by the
BlockReader will not evict the recovery-restored blocks. The
save-files being written partially does not change that.

> Now as pg_hibernator doesn't give any preference to usage_count while
> saving buffer's, it can also evict the buffers populated by recovery
> with some lower used pages of previous run.

The case we're discussing (checkpointer/BufferSaver/some-other-process
crash during a smart/fast shutdown) should occur rarely in practice.
Although Postgres Hibernator is not yet proven to do the wrong thing
in this case, I hope you'd agree that BlockReaders evicting buffers
populated by recovery process is not catastrophic at all, merely
inconvenient from performance perspective. Also, the impact is only on
the initial performance immediately after startup, since application
queries will re-prime the shared-buffers with whatever buffers they
need.

> I think Block Readers will remove file only after reading and populating
> buffers from it

Correct.

> and that's the reason I mentioned that it can lead to doing
> both recovery as well as load buffers based on file saved by buffer
> saver.

I am not sure I completely understand the implication here, but I
think the above description of case where
recovery-followed-by-BlockReaders not causing a concern may cover it.

Best regards,
-- 
Gurjeet Singh http://gurjeet.singh.im/

EDB www.EnterpriseDB.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposing pg_hibernate

2014-06-10 Thread Gurjeet Singh

On Sun, Jun 8, 2014 at 3:24 AM, Amit Kapila  wrote:
> On Fri, Jun 6, 2014 at 5:31 PM, Gurjeet Singh  wrote:
>> On Thu, Jun 5, 2014 at 11:32 PM, Amit Kapila 
>> wrote:
>
>> > Buffer saver process itself can crash while saving or restoring
>> > buffers.
>>
>> True. That may lead to partial list of buffers being saved. And the
>> code in Reader process tries hard to read only valid data, and punts
>> at the first sight of data that doesn't make sense or on ERROR raised
>> from Postgres API call.
>
> Inspite of Reader Process trying hard, I think we should ensure by
> some other means that file saved by buffer saver is valid (may be
> first write in tmp file and then rename it or something else).

I see no harm in current approach, since even if the file is partially
written on shutdown, or if it is corrupted due to hardware corruption,
the worst that can happen is the BlockReaders will try to restore, and
possibly succeed, a wrong block to shared-buffers.

I am okay with your approach of first writing to a temp file, if
others see an advantage of doing this and insist on it.

>> > IIUC on shutdown request, postmaster will send signal to BG Saver
>> > and BG Saver will save the buffers and then postmaster will send
>> > signal to checkpointer to shutdown.  So before writing Checkpoint
>> > record, BG Saver can crash (it might have saved half the buffers)
>>
>> Case handled as described above.
>>
>> > or may BG saver saves buffers, but checkpointer crashes (due to
>> > power outage or any such thing).
>>
>> Checkpointer process' crash seems to be irrelevant to Postgres
>> Hibernator's  workings.
>
> Yeap, but if it crashes before writing checkpoint record, it will lead to
> recovery which is what we are considering.

Good point.

In case of such recovery, the recovery process will read in the blocks
that were recently modified, and were possibly still in shared-buffers
when Checkpointer crashed. So after recovery finishes, the
BlockReaders will be invoked (because save-files were successfully
written before the crash), and they would request the same blocks to
be restored. Most likely, those blocks would already be in
shared-buffers, hence no cause of concern regarding BlockReaders
evicting buffers populated by recovery.

>> I think you are trying to argue the wording in my claim "save-files
>> are created only on clean shutdowons; not on a crash or immediate
>> shutdown", by implying that a crash may occur at any time during and
>> after the BufferSaver processing. I agree the wording can be improved.
>
> Not only wording, but in your above mail Case 2 and 1b would need to
> load buffer's and perform recovery as well, so we need to decide which
> one to give preference.

In the cases you mention, 1b and 2, ideally there will be no
save-files because the server either (1b) was still running, or (2)
crashed.

If there were any save-files present during the previous startup (the
one that happened before (1b) hot-backup or (2) crash) of the server,
they would have been removed by the BlockReaders soon after the
startup.

> So If you agree that we should have consideration for recovery data
> along with saved files data, then I think we have below options to
> consider:

I don't think any of the options you mention need any consideration
because recovery and buffer-restore process don't seem to be at
conflict with each other; not enough to be a concern, IMHO.

Thanks and best regards,
-- 
Gurjeet Singh http://gurjeet.singh.im/

EDB www.EnterpriseDB.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposing pg_hibernate

2014-06-10 Thread Gurjeet Singh

On Tue, Jun 10, 2014 at 12:02 PM, Robert Haas  wrote:
> If recovery has been running for a long time, then restoring
> buffers from some save file created before that is probably a bad
> idea, regardless of whether the buffers already loaded were read in by
> recovery itself or by queries running on the system.  But if you're
> saying that doesn't happen, then there's no problem there.

Normally, it won't happen. There's one case I can think of, which has
to coincide with a small window of time for such a thing to happen.

Consider this:
.) A database is shutdown, which creates the save-files in
$PGDATA/pg_hibernator/.
.) The database is restarted.
.) BlockReaders begin to read and restore the disk blocks into buffers.
.) Before the BlockReaders could finish*, a copy of the database is
taken (rsync/cp/FS-snapshot/etc.)
This causes the the save-files to be present in the copy, because
the BlockReaders haven't deleted them, yet.
* (The BlockReaders ideally finish their task in first few minutes
after first of them is started.)
.) The copy of the database is used to restore and erect a warm-standby.
.) The warm-standby starts replaying logs from WAL archive/stream.
.) Some time (hours/weeks/months) later, the warm-standby is promoted
to be a master.
.) It starts the Postgres Hibernator, which sees save-files in
$PGDATA/pg_hibernator/ and launches BlockReaders.

At this point, the BlockReaders will restore the blocks that were
present in original DB's shared-buffers at the time of shutdown. So,
this would fetch blocks into shared-buffers that may be completely
unrelated to the blocks recently operated on by the recovery process.

And it's probably accepted by now that such a bahviour is not
catastrophic, merely inconvenient.

Best regards,
-- 
Gurjeet Singh http://gurjeet.singh.im/

EDB www.EnterpriseDB.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] /proc/self/oom_adj is deprecated in newer Linux kernels

2014-06-10 Thread Gurjeet Singh

On Tue, Jun 10, 2014 at 1:13 PM, David G Johnston
 wrote:
> Gurjeet Singh-4 wrote
>> So the argument that this GUC is a security concern, can be ignored.
>> Root user (one with control of start script) still controls the lowest
>> badness setting of all Postgres processes. If done at fork_process
>> time, the child process simply inherits parent's badness setting.
>
> The counter point here is that the postmaster can be set to "no kill" and

Only the root user can do that, since he/she has control over the
start script. All that the DBA could do with a GUC is to set backends'
badness worse than postmaster's, but bot better.

> the >= condition allows for children to achieve the same while it is our
> explicit intent that the children be strictly > parent.

I don't think anyone argued for that behaviour.

> To that end, should the adjustment value be provided as an offset to the
> postmasters instead of an absolute value - and disallow <= zero offset
> values in the process?

Seems unnecessary, given current knowledge.

> I get and generally agree with the environment variable proposal and it's
> stated goal to restrict whom can makes changes. But how much less cost does
> an environment variable have than a GUC if one GUC argument is still its
> maintenance overhead?

Having it as a GUC would have meant that two entities are required to
get the configuration right: one who controls start scripts, and the
other who controls GUC settings.

With the environment variable approach, a root user alone can control
the behaviour like so in start script:

echo -200 > /proc/self/oom_score_adj
export PG_OOM_ADJUST_FILE=oom_score_adj
export PG_OOM_ADJUST_VALUE=-100

Best regards,
-- 
Gurjeet Singh http://gurjeet.singh.im/

EDB www.EnterpriseDB.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] /proc/self/oom_adj is deprecated in newer Linux kernels

2014-06-10 Thread Gurjeet Singh

On Tue, Jun 10, 2014 at 11:52 AM, Tom Lane  wrote:
> and...@anarazel.de (Andres Freund) writes:
>> On 2014-06-10 11:20:28 -0400, Tom Lane wrote:
>>> Maybe I'm mistaken, but I thought once the fork_process code has reset our
>>> process's setting to zero it's not possible to lower it again (without
>>> privileges we'd not have).
>
>> No, doesn't look that way. It's possible to reset it to the value set at
>> process start. So unless we introduce double forks for every backend
>> start it can be reset by ordinary processes.
>
> That's kind of annoying --- I wonder why they went to the trouble of doing
> that?  But anyway, it's probably not worth the cost of double-forking to
> prevent it.  I still say though that this is not a reason to make it as
> easy as change-a-GUC to break the intended behavior.
>
> Robert's idea of having the start script set an environment variable to
> control the OOM adjustment reset seems like it would satisfy my concern.

I'm fine with this solution. Should this be a constant 0, or be
configurable based on env. variable's value?

-- 
Gurjeet Singh http://gurjeet.singh.im/

EDB www.EnterpriseDB.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] /proc/self/oom_adj is deprecated in newer Linux kernels

2014-06-10 Thread Gurjeet Singh

On Tue, Jun 10, 2014 at 11:24 AM, Andres Freund  wrote:
> On 2014-06-10 11:14:43 -0400, Tom Lane wrote:
>> Sure, but what's that have to do with this?  Any Red Hat or PGDG RPM will
>> come with this code already enabled in the build, so there is no need for
>> anyone to have a GUC to play around with the behavior.
>
> That's imo a fair point. Unless I misunderstand things Gurjeet picked
> the topic up again because he wants to increase the priority of the
> children. Is that correct Gurjeet?

Yes. A DBA would like to prevent the postmaster from being killed by
OOM killer, but let the child processes be still subject to OOM
killer's whim.

Best regards,
-- 
Gurjeet Singh http://gurjeet.singh.im/

EDB www.EnterpriseDB.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] /proc/self/oom_adj is deprecated in newer Linux kernels

2014-06-10 Thread Gurjeet Singh

's badness, but keep it same as postmaster's
$ echo -100 > /proc/1837/oom_score_adj

$ for p in $(echo $(pgserverPIDList)| cut -d , ...
1835 -100 
/home/gurjeet/dev/pgdbuilds/oom_guc/db/bin/postgres-D/home/gurjeet/dev/pgdbuilds/oom_guc/db/data
1837 -100 postgres: checkpointer process
...

# Lower checkpointer's badness, and try to lower it below postmaster's
$ echo -101 > /proc/1837/oom_score_adj
bash: echo: write error: Permission denied

$ for p in $(echo $(pgserverPIDList)| cut -d , ...
1835 -100 
/home/gurjeet/dev/pgdbuilds/oom_guc/db/bin/postgres-D/home/gurjeet/dev/pgdbuilds/oom_guc/db/data
1837 -100 postgres: checkpointer process
...

# As root, force child process' badness to be lower than postnaster's
$ sudo echo -101 > /proc/1837/oom_score_adj
[sudo] password for gurjeet:

$ for p in $(echo $(pgserverPIDList)| cut -d , ...
1835 -100 
/home/gurjeet/dev/pgdbuilds/oom_guc/db/bin/postgres-D/home/gurjeet/dev/pgdbuilds/oom_guc/db/data
1837 -101 postgres: checkpointer process
...

-- 
Gurjeet Singh http://gurjeet.singh.im/

EDB www.EnterpriseDB.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] /proc/self/oom_adj is deprecated in newer Linux kernels

2014-06-10 Thread Gurjeet Singh

On Tue, Jun 10, 2014 at 10:32 AM, Andres Freund  wrote:
> On 2014-06-10 07:56:01 -0400, Gurjeet Singh wrote:
>> Providing it as GUC would have given end users both the peices, but
>> with a compile-time option they have only one half of the solution;
>> except if they go compile their own binaries, which forces them into
>> being packagers.
>>
>> I am not alone in feeling that if Postgres wishes to provide a control
>> over child backend's oom_score_adj, it should be a GUC parameter
>> rather than a compile-time option. Yesterday a customer wanted to
>> leverage this and couldn't because they refuse to maintain their own
>> fork of Postgres code.
>
> Independent of the rest of the discussion, I think there's one more
> point: Trying to keep your system stable by *increasing* the priority of
> normal backends is a bad idea. If you system gets into OOM land you need
> to fix that, not whack who gets killed around.
> The reason it makes sense to increase the priority of the postmaster is
> that that *does* increase the stability by cleaning up resources and
> restarting everything.

As it stands right now, a user can decrease the likelyhood of
Postmaster being killed by adjusting the start script, but that
decreases the likelyhood of al the child processes, too, making the
Postmaster just as likely as a kill-candidate, maybe even higher
because it's the parent, as any backend.

This patch gives the user a control to let the backend's likelyhood of
being killed be different/higher than that of the postmaster.

The users were already capable of doing that, but were required to
custom-compile Postgres to get the benefits. This patch just removes
that requirement.

Best regards,
-- 
Gurjeet Singh http://gurjeet.singh.im/

EDB www.EnterpriseDB.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] /proc/self/oom_adj is deprecated in newer Linux kernels

2014-06-10 Thread Gurjeet Singh

On Tue, Jun 10, 2014 at 10:02 AM, Tom Lane  wrote:
> Gurjeet Singh  writes:
>> Startup scripts are not solely in the domain of packagers. End users
>> can also be expected to develop/edit their own startup scripts.
>
>> Providing it as GUC would have given end users both the peices, but
>> with a compile-time option they have only one half of the solution;
>> except if they go compile their own binaries, which forces them into
>> being packagers.
>
> I don't find that this argument holds any water at all.  Anyone who's
> developing their own start script can be expected to manage recompiling
> Postgres.

I respectfully disagree. Writing and managing init/start scripts
requires much different set of expertise than compiling and managing
builds of a critical software like a database product.

I would consider adding `echo -1000 > /proc/self/oom_score_adj` to a
start script as a deployment specific tweak, and not 'developing own
start script'.

> Extra GUCs do not have zero cost, especially not ones that are as
> complicated-to-explain as this would be.
>
> I would also argue that there's a security issue with making it a GUC.
> A non-root DBA should not have the authority to decide whether or not
> postmaster child processes run with nonzero OOM adjustment; that decision
> properly belongs to whoever has authorized the root-owned startup script
> to change the adjustment in the first place.  So seeing this as two
> independent pieces is not only wrong but dangerous.

>From experiments last night, I see that child process' oom_score_adj
is capped by the parent process' setting that is forking. So I don't
think it's a security risk from that perspective.

I'll retest and provide the exact findings.

Best regards,
-- 
Gurjeet Singh http://gurjeet.singh.im/

EDB www.EnterpriseDB.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] /proc/self/oom_adj is deprecated in newer Linux kernels

2014-06-10 Thread Gurjeet Singh

On Mon, Sep 19, 2011 at 10:18 AM, Tom Lane  wrote:
> Peter Eisentraut  writes:
>> On sön, 2011-09-18 at 12:21 -0400, Tom Lane wrote:
>>> But having said that, it wouldn't be very hard to arrange things so that
>>> if you did have both symbols defined, the code would only attempt to
>>> write oom_adj if it had failed to write oom_score_adj; which is about as
>>> close as you're likely to get to a kernel version test for this.
>
>> Why is this feature not a run-time configuration variable or at least a
>> configure option?  It's awfully well hidden now.  I doubt a lot of
>> people are using this even though they might wish to.
>
> See the thread in which the feature was designed originally:
> http://archives.postgresql.org/pgsql-hackers/2010-01/msg00170.php
>
> The key point is that to get useful behavior, you need cooperation
> between both a root-privileged startup script and the PG executable.
> That tends to throw the problem into the domain of packagers, more
> than end users, and definitely puts a big crimp in the idea that
> run-time configuration of just half of the behavior could be helpful.
> So far, no Linux packagers have complained that the design is inadequate
> (a position that I also hold when wearing my red fedora) so I do not
> feel a need to complicate it further.

Startup scripts are not solely in the domain of packagers. End users
can also be expected to develop/edit their own startup scripts.

Providing it as GUC would have given end users both the peices, but
with a compile-time option they have only one half of the solution;
except if they go compile their own binaries, which forces them into
being packagers.

I am not alone in feeling that if Postgres wishes to provide a control
over child backend's oom_score_adj, it should be a GUC parameter
rather than a compile-time option. Yesterday a customer wanted to
leverage this and couldn't because they refuse to maintain their own
fork of Postgres code.

Please find attached a patch to turn it into a GUC, #ifdef'd by __linux__ macro.

Best regards,
-- 
Gurjeet Singh http://gurjeet.singh.im/

EDB www.EnterpriseDB.com
diff --git a/src/backend/postmaster/fork_process.c b/src/backend/postmaster/fork_process.c
index f6df2de..7b9bc38 100644
--- a/src/backend/postmaster/fork_process.c
+++ b/src/backend/postmaster/fork_process.c
@@ -22,6 +22,11 @@
 #endif
 
 #ifndef WIN32
+
+#ifdef __linux__
+int linux_oom_score_adj = 0;
+#endif
+
 /*
  * Wrapper for fork(). Return values are the same as those for fork():
  * -1 if the fork failed, 0 in the child process, and the PID of the
@@ -78,7 +83,7 @@ fork_process(void)
 		 * LINUX_OOM_SCORE_ADJ #defined to 0, or to some other value that you
 		 * want child processes to adopt here.
 		 */
-#ifdef LINUX_OOM_SCORE_ADJ
+#ifdef __linux__
 		{
 			/*
 			 * Use open() not stdio, to ensure we control the open flags. Some
@@ -92,13 +97,12 @@ fork_process(void)
 char		buf[16];
 int			rc;
 
-snprintf(buf, sizeof(buf), "%d\n", LINUX_OOM_SCORE_ADJ);
+snprintf(buf, sizeof(buf), "%d\n", linux_oom_score_adj);
 rc = write(fd, buf, strlen(buf));
 (void) rc;
 close(fd);
 			}
 		}
-#endif   /* LINUX_OOM_SCORE_ADJ */
 
 		/*
 		 * Older Linux kernels have oom_adj not oom_score_adj.  This works
@@ -106,7 +110,6 @@ fork_process(void)
 		 * it's necessary to build Postgres to work with either API, you can
 		 * define both LINUX_OOM_SCORE_ADJ and LINUX_OOM_ADJ.
 		 */
-#ifdef LINUX_OOM_ADJ
 		{
 			/*
 			 * Use open() not stdio, to ensure we control the open flags. Some
@@ -120,13 +123,13 @@ fork_process(void)
 char		buf[16];
 int			rc;
 
-snprintf(buf, sizeof(buf), "%d\n", LINUX_OOM_ADJ);
+snprintf(buf, sizeof(buf), "%d\n", linux_oom_score_adj);
 rc = write(fd, buf, strlen(buf));
 (void) rc;
 close(fd);
 			}
 		}
-#endif   /* LINUX_OOM_ADJ */
+#endif   /* __linux__ */
 
 		/*
 		 * Make sure processes do not share OpenSSL randomness state.
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 1d094f0..8713abb 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -128,6 +128,7 @@ extern bool synchronize_seqscans;
 extern char *SSLCipherSuites;
 extern char *SSLECDHCurve;
 extern bool SSLPreferServerCiphers;
+extern int	linux_oom_score_adj;
 
 #ifdef TRACE_SORT
 extern bool trace_sort;
@@ -2554,6 +2555,18 @@ static struct config_int ConfigureNamesInt[] =
 		NULL, NULL, NULL
 	},
 
+#ifdef __linux__
+	{
+		{"linux_oom_score_adj", PGC_POSTMASTER, RESOURCES_KERNEL,
+			gettext_noop("Sets the oom_score_adj parameter of postmaster's child processes."),
+			NULL,
+		},
+		&linux_oom_score_adj,
+		0, -1000, 1000,
+		NULL, NULL, NULL
+	},
+#endif /* __linux__ */
+
 	/* End-of-list marker */
 	{
 		{NULL, 0, 0, NULL, NULL}, NULL, 0, 0, 0, NULL, NULL, NULL

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Using Index-only scans to speed up count(*)

2014-06-07 Thread Gurjeet Singh

On Sat, Jun 7, 2014 at 8:56 AM, Cédric Villemain  wrote:
> Le samedi 7 juin 2014 08:35:27 Gurjeet Singh a écrit :
>
>> PS: Please note that I am not proposing to add support for the
>> optimizer hint embedded in Mitsuru's query.
>
> :-)

Even though I (sometimes) favor hints, and developed the optimizer
hints feature in EDB (PPAS), I know how much Postgres **hates** [1]
optimizer hints :) So just trying to wade off potential flamewar-ish
comments.

[1]: http://wiki.postgresql.org/wiki/Todo#Features_We_Do_Not_Want

Best regards,
-- 
Gurjeet Singh http://gurjeet.singh.im/

EDB www.EnterpriseDB.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Using Index-only scans to speed up count(*)

2014-06-07 Thread Gurjeet Singh

While reading [1] in context of Postgres Hibernator, I see that
Mitsuru mentioned one of the ways other RDBMS allows count(*) to be
driven by an index.

> 'select /*+ INDEX(emp emp_pk) */ count(*) from emp;' to load index blocks

I am not sure if Postgres planner already allows this, but it would be
great if the planner considered driving a count(*) query using a
non-partial index, in the hopes that it turns into an index-only scan,
and hence returns count(*) result faster.The non-partial index may not
necessarily be the primary key index, it can be chosen purely based on
size, favouring smaller indexes.

This may alleviate some of the concerns of people migrating
applications from other DBMS' that perform count(*) in a blink of an
eye.

[1]: 
http://www.postgresql.org/message-id/20110507.08.83883502.iwas...@jp.freebsd.org

Best regards,

PS: Please note that I am not proposing to add support for the
optimizer hint embedded in Mitsuru's query.
-- 
Gurjeet Singh http://gurjeet.singh.im/

EDB www.EnterpriseDB.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposing pg_hibernate

2014-06-06 Thread Gurjeet Singh

On Thu, Jun 5, 2014 at 11:32 PM, Amit Kapila  wrote:
> On Thu, Jun 5, 2014 at 5:39 PM, Gurjeet Singh  wrote:
>>
>> > On Tue, Jun 3, 2014 at 5:43 PM, Gurjeet Singh  wrote:
>> Case 2 also won't cause any buffer restores because the save-files are
>> created only on clean shutdowons; not on a crash or immediate
>> shutdown.
>
> How do you ensure that buffers are saved only on clean shutdown?

Postmaster sends SIGTERM only in "smart" or "fast" shutdown requests.

> Buffer saver process itself can crash while saving or restoring
> buffers.

True. That may lead to partial list of buffers being saved. And the
code in Reader process tries hard to read only valid data, and punts
at the first sight of data that doesn't make sense or on ERROR raised
from Postgres API call.

> IIUC on shutdown request, postmaster will send signal to BG Saver
> and BG Saver will save the buffers and then postmaster will send
> signal to checkpointer to shutdown.  So before writing Checkpoint
> record, BG Saver can crash (it might have saved half the buffers)

Case handled as described above.

> or may BG saver saves buffers, but checkpointer crashes (due to
> power outage or any such thing).

Checkpointer process' crash seems to be irrelevant to Postgres
Hibernator's  workings.

I think you are trying to argue the wording in my claim "save-files
are created only on clean shutdowons; not on a crash or immediate
shutdown", by implying that a crash may occur at any time during and
after the BufferSaver processing. I agree the wording can be improved.
How about

... save-files are created only when Postgres is requested to shutdown
in normal (smart or fast) modes.

Note that I am leaving out the mention of crash.

Best regards,
-- 
Gurjeet Singh http://gurjeet.singh.im/

EDB www.EnterpriseDB.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposing pg_hibernate

2014-06-06 Thread Gurjeet Singh

On Thu, Jun 5, 2014 at 11:32 PM, Amit Kapila  wrote:

> Another thing is don't you want to handle SIGQUIT signal in bg saver?

I think bgworker_quickdie registered in StartBackgroundWorker() serves
the purpose just fine.

Best regards,
-- 
Gurjeet Singh http://gurjeet.singh.im/

EDB www.EnterpriseDB.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposing pg_hibernate

2014-06-05 Thread Gurjeet Singh

On Wed, Jun 4, 2014 at 2:50 PM, Robert Haas  wrote:
> The thing I was concerned about is that the system might have been in
> recovery for months.  What was hot at the time the base backup was
> taken seems like a poor guide to what will be hot at the time of
> promotion. Consider a history table, for example: the pages at the
> end, which have just been written, are much more likely to be useful
> than anything earlier.

I think you are specifically talking about a warm-standby that runs
recovery for months before being brought online. As described in my
response to Amit, if the base backup used to create that standby was
taken after the BlockReaders had restored the buffers (which should
complete within few minutes of startup, even for large databases),
then there's no concern since the base backup wouldn't contain the
save-files.

If it's a hot-standby, the restore process would start as soon as the
database starts accepting connections, finish soon after, and get
completely out of the way of the normal recovery process. At which
point the buffers populated by the recovery would compete only with
the buffers being requested by backends, which is the normal
behaviour.

Best regards,
-- 
Gurjeet Singh http://gurjeet.singh.im/

EDB www.EnterpriseDB.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposing pg_hibernate

2014-06-05 Thread Gurjeet Singh

On Wed, Jun 4, 2014 at 2:52 PM, Andres Freund  wrote:
> On 2014-06-04 14:50:39 -0400, Robert Haas wrote:
>> The thing I was concerned about is that the system might have been in
>> recovery for months.  What was hot at the time the base backup was
>> taken seems like a poor guide to what will be hot at the time of
>> promotion. Consider a history table, for example: the pages at the
>> end, which have just been written, are much more likely to be useful
>> than anything earlier.
>
> I'd assumed that the hibernation files would simply be excluded from the
> basebackup...

Yes, they will be excluded, provided the BlockReader processes have
finished, because each BlockReader unlinks its save-file after it is
done restoring buffers listed in it.

Best regards,
-- 
Gurjeet Singh http://gurjeet.singh.im/

EDB www.EnterpriseDB.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposing pg_hibernate

2014-06-05 Thread Gurjeet Singh

On Wed, Jun 4, 2014 at 12:54 AM, Amit Kapila  wrote:
> On Tue, Jun 3, 2014 at 5:43 PM, Gurjeet Singh  wrote:
>>
>> For sizeable shared_buffers size, the restoration of the shared
>> buffers can take several seconds.
>
> Incase of recovery, the shared buffers saved by this utility are
> from previous shutdown which doesn't seem to be of more use
> than buffers loaded by recovery.

I feel the need to enumerate the recovery scenarios we're talking
about so that we're all on the same page.

1) Hot backup (cp/rsync/pg_basebackup/.. while the master was running)
followed by
  1a) recovery using archives or streaming replication.
1a.i) database in hot-standby mode
1a.ii) database not in hot-standby mode, i.e. it's in warm-standby mode.
  1b) minimal recovery, that is, recover only the WAL available in
pg_xlog, then come online.

2) Cold backup of a crashed master, followed by startup of the copy
(causing crash recovery; IMHO same as case 1b above.).

3) Cold backup of clean-shutdown master, followed by startup of the
copy (no recovery).

In cases 1.x there won't be any save-files (*), because the
BlockReader processes remove their respective save-file when they are
done restoring the buffers, So the hot/warm-standby created thus will
not inherit the save-files, and hence post-recovery will not cause any
buffer restores.

Case 2 also won't cause any buffer restores because the save-files are
created only on clean shutdowons; not on a crash or immediate
shutdown.

Case 3, is the sweet spot of pg_hibernator. It will save buffer-list
on shutdown, and restore them when the backup-copy is started
(provided pg_hibernator is installed there).

(*) If a hot-backup is taken immediately after database comes online,
since BlockReaders may still be running and not have deleted the
save-files, the save-files may end up in backup, and hence cause the
recovery-time conflicts we're talking about. This should be rare in
practice, and even when it does happen, at worst it will affect the
initial performance of the cluster.

>> I have a feeling the users wouldn't
>> like their master database take up to a few minutes to start accepting
>> connections.
>
> I think this is fair point and to address this we can have an option to
> decide when to load buffer's and have default value as load before
> recovery.

Given the above description, I don't think crash/archive recovery is a
concern anymore. But if that corner case is still a concern, I
wouldn't favour making recovery slow by default, and make users of
pg_hibernator pay for choosing to use the extension. I'd prefer the
user explicitly ask for a behaviour that makes startups slow.

> One other point:
>> Note that the BuffersSaver process exists at all times, even when this
>> parameter is set to `false`. This is to allow the DBA to enable/disable
>> the
>> extension without having to restart the server. The BufferSaver process
>> checks this parameter during server startup and right before shutdown, and
>> honors this parameter's value at that time.
>
> Why can't it be done when user register's the extension by using dynamic
> background facility "RegisterDynamicBackgroundWorker"?

There's no user interface to this extension except for the 3 GUC
parameters; not even CREATE EXTENSION. The DBA is expected to append
this extension's name in shared_preload_libraries.

Since this extension declares one of its parameters PGC_POSTMASTER, it
can't be loaded via the SQL 'LOAD ' command.

postgres=# load 'pg_hibernator';
FATAL:  cannot create PGC_POSTMASTER variables after startup
FATAL:  cannot create PGC_POSTMASTER variables after startup
The connection to the server was lost. Attempting reset: Succeeded.

Best regards,

PS: I was out sick yesterday, so couldn't respond promptly.
-- 
Gurjeet Singh http://gurjeet.singh.im/

EDB www.EnterpriseDB.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposing pg_hibernate

2014-06-03 Thread Gurjeet Singh

On Tue, Jun 3, 2014 at 8:13 AM, Gurjeet Singh  wrote:
> On Tue, Jun 3, 2014 at 7:57 AM, Robert Haas  wrote:
>> On Thu, May 29, 2014 at 12:12 AM, Amit Kapila  
>> wrote:
>>>> IMHO, all of these caveats, would affect a very small fraction of
>>>> use-cases and are eclipsed by the benefits this extension provides in
>>>> normal cases.
>>>
>>> I agree with you that there are only few corner cases where evicting
>>> shared buffers by this utility would harm, but was wondering if we could
>>> even save those, say if it would only use available free buffers.  I think
>>> currently there is no such interface and inventing a new interface for this
>>> case doesn't seem to reasonable unless we see any other use case of
>>> such a interface.
>>
>> It seems like it would be best to try to do this at cluster startup
>> time, rather than once recovery has reached consistency.  Of course,
>> that might mean doing it with a single process, which could have its
>> own share of problems.  But I'm somewhat inclined to think that if

Currently pg_hibernator uses ReadBufferExtended() API, and AIUI, that
API requires a database connection//shared-memory attachment, and that
a backend process cannot switch between databases after the initial
connection.

>> own share of problems.  But I'm somewhat inclined to think that if
>> recovery has already run for a significant period of time, the blocks
>> that recovery has brought into shared_buffers are more likely to be
>> useful than whatever pg_hibernate would load.

The applications that connect to a standby may have a different access
pattern than the applications that are operating on the master
database. So the buffers that are being restored by startup process
may not be relevant to the workload on the standby.

Best regards,
-- 
Gurjeet Singh http://gurjeet.singh.im/

EDB www.EnterpriseDB.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposing pg_hibernate

2014-06-03 Thread Gurjeet Singh

On Tue, Jun 3, 2014 at 7:57 AM, Robert Haas  wrote:
> On Thu, May 29, 2014 at 12:12 AM, Amit Kapila  wrote:
>>> IMHO, all of these caveats, would affect a very small fraction of
>>> use-cases and are eclipsed by the benefits this extension provides in
>>> normal cases.
>>
>> I agree with you that there are only few corner cases where evicting
>> shared buffers by this utility would harm, but was wondering if we could
>> even save those, say if it would only use available free buffers.  I think
>> currently there is no such interface and inventing a new interface for this
>> case doesn't seem to reasonable unless we see any other use case of
>> such a interface.
>
> It seems like it would be best to try to do this at cluster startup
> time, rather than once recovery has reached consistency.  Of course,
> that might mean doing it with a single process, which could have its
> own share of problems.  But I'm somewhat inclined to think that if
> recovery has already run for a significant period of time, the blocks
> that recovery has brought into shared_buffers are more likely to be
> useful than whatever pg_hibernate would load.

I am not absolutely sure of the order of execution between recovery
process and the BGWorker, but ...

For sizeable shared_buffers size, the restoration of the shared
buffers can take several seconds. I have a feeling the users wouldn't
like their master database take up to a few minutes to start accepting
connections. From my tests [1], " In the 'App after Hibernator' [case]
... This took 70 seconds for reading the ~4 GB database."

[1]: 
http://gurjeet.singh.im/blog/2014/04/30/postgres-hibernator-reduce-planned-database-down-times/

Best regards,
-- 
Gurjeet Singh http://gurjeet.singh.im/

EDB www.EnterpriseDB.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposing pg_hibernate

2014-06-02 Thread Gurjeet Singh

On Fri, May 30, 2014 at 5:33 PM, Josh Kupershmidt  wrote:
> On Tue, May 27, 2014 at 10:01 PM, Gurjeet Singh  wrote:
>
>> When the Postgres server is being stopped/shut down, the `Buffer
>> Saver` scans the
>> shared-buffers of Postgres, and stores the unique block identifiers of
>> each cached
>> block to the disk. This information is saved under the 
>> `$PGDATA/pg_hibernator/`
>> directory. For each of the database whose blocks are resident in shared 
>> buffers,
>> one file is created; for eg.: `$PGDATA/pg_hibernator/2.postgres.save`.
>
> This file-naming convention seems a bit fragile. For example, on my
> filesystem (HFS) if I create a database named "foo / bar", I'll get a
> complaint like:
>
> ERROR:  could not open "pg_hibernator/5.foo / bar.save": No such file
> or directory
>
> during shutdown.

Thanks for the report. I have reworked the file naming, and now the
save-file name is simply '.save', so the name of a database
does not affect the file name on disk. Instead, the null-terminated
database name is now written in the file, and read back for use when
restoring the buffers.

Attached is the new version of pg_hibernator, with updated code and README.

Just a heads up for anyone who might have read/reviewed previous
version's code, there's some unrelated trivial code and Makefile
changes as well in this version, which can be easily spotted by a
`diff -r`.

Best regards,
-- 
Gurjeet Singh http://gurjeet.singh.im/

EDB www.EnterpriseDB.com

pg_hibernator.v2.tgz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposing pg_hibernate

2014-05-29 Thread Gurjeet Singh

On May 29, 2014 12:12 AM, "Amit Kapila"  wrote:
>
> I agree with you that there are only few corner cases where evicting
> shared buffers by this utility would harm, but was wondering if we could
> even save those, say if it would only use available free buffers.  I think
> currently there is no such interface and inventing a new interface for
this
> case doesn't seem to reasonable unless we see any other use case of
> such a interface.

+1

Re: [HACKERS] Proposing pg_hibernate

2014-05-28 Thread Gurjeet Singh

On Wed, May 28, 2014 at 2:15 AM, Amit Kapila  wrote:
> On Wed, May 28, 2014 at 7:31 AM, Gurjeet Singh  wrote:
>  > Caveats
>> --
>>
>> - Buffer list is saved only when Postgres is shutdown in "smart" and
>> "fast" modes.
>>
>> That is, buffer list is not saved when database crashes, or on
>> "immediate"
>> shutdown.
>>
>> - A reduction in `shared_buffers` is not detected.
>>
>> If the `shared_buffers` is reduced across a restart, and if the
>> combined
>> saved buffer list is larger than the new shared_buffers, Postgres
>> Hibernator continues to read and restore blocks even after
>> `shared_buffers`
>> worth of buffers have been restored.
>
> How about the cases when shared buffers already contain some
> data:
> a. Before Readers start filling shared buffers, if this cluster wishes
> to join replication as a slave and receive the data from master, then
> this utility might need to evict some buffers filled during startup
> phase.

A cluster that wishes to be a replication standby, it would do so
while it's in startup phase. The BlockReaders are launched immediately
on cluster reaching consistent state, at which point, I presume, in
most of the cases, most of the buffers would be unoccupied. Hence
BlockReaders might evict the occupied buffers, which may be a small
fraction of total shared_buffers.

> b. As soon as the server completes startup (reached consistent
> point), it allows new connections which can also use some shared
> buffers before Reader process could use shared buffers or are you
> planing to change the time when users can connect to database.

The BlockReaders are launched immediately after the cluster reaches
consistent state, that is, just about when it is ready to accept
connections. So yes, there is a possibility that the I/O caused by the
BlockReaders may affect the performance of queries executed right at
cluster startup. But given that the performance of those queries was
anyway going to be low (because of empty shared buffers), and that
BlockReaders tend to cause sequential reads, and that by default
there's only one BlockReader active at a time, I think this won't be a
concern in most of the cases. By the time the shared buffers start
getting filled up, the buffer replacement strategy will evict any
buffers populated by BlockReaders if they are not used by the normal
queries.

In the 'Sample Runs' section of my blog [1], I compared the cases
'Hibernator w/ App' and 'Hibernator then App', which demonstrate that
launching application load while the BlockReaders are active does
cause performance of both to be impacted by each other. But overall
it's a net win for application performance.

> I am not sure if replacing shared buffer contents in such cases can
> always be considered useful.

IMHO, all of these caveats, would affect a very small fraction of
use-cases and are eclipsed by the benefits this extension provides in
normal cases.

[1]: 
http://gurjeet.singh.im/blog/2014/04/30/postgres-hibernator-reduce-planned-database-down-times/

Best regards,
-- 
Gurjeet Singh http://gurjeet.singh.im/

EDB www.EnterpriseDB.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposing pg_hibernate

2014-05-27 Thread Gurjeet Singh

ring server startup and right before shutdown, and
honors this parameter's value at that time.

To enable/disable Postgres Hibernator at runtime, change the value in
`postgresql.conf` and use `pg_ctl reload` to make Postgres re-read the new
parameter values from `postgresql.conf`.

Default value: `true`.

- `pg_hibernator.parallel`

This parameter controls whether Postgres Hibernator launches the BlockReader
processes in parallel, or sequentially, waiting for current BlockReader to
exit before launching the next one.

When enabled, all the BlockReaders, one for each database, will be launched
simultaneously, and this may cause huge random-read flood on disks if there
are many databases in cluster. This may also cause some BlockReaders to fail
to launch successfully because of `max_worker_processes` limit.

Default value: `false`.

- `pg_hibernator.default_database`

The BufferSaver process needs to connect to a database in order to perform
the database-name lookups etc. This parameter controls which database the
BufferSaver process connects to for performing these operations.

Default value: `postgres`.

Caveats
--

- Buffer list is saved only when Postgres is shutdown in "smart" and
"fast" modes.

That is, buffer list is not saved when database crashes, or on "immediate"
shutdown.

- A reduction in `shared_buffers` is not detected.

If the `shared_buffers` is reduced across a restart, and if the combined
saved buffer list is larger than the new shared_buffers, Postgres
Hibernator continues to read and restore blocks even after `shared_buffers`
worth of buffers have been restored.

FAQ
--

- What is the relationship between `pg_buffercache`, `pg_prewarm`, and
`pg_hibernator`?

They all allow you to do different things with Postgres' shared buffers.

+ pg_buffercahce:

Inspect and show contents of shared buffers

+ pg_prewarm:

Load some table/index/fork blocks into shared buffers. User needs
to tell it which blocks to load.

+ pg_hibernator:

Upon shutdown, save list of blocks stored in shared buffers. Upon
startup, loads those blocks back into shared buffers.

The goal of Postgres Hibernator is to be invisible to the user/DBA.
Whereas with `pg_prewarm` the user needs to know a lot of stuff about
what they really want to do, most likely information gathered via
`pg_buffercahce`.

- Does `pg_hibernate` use either `pg_buffercache` or `pg_prewarm`?

No, Postgres Hibernator works all on its own.

If the concern is, "Do I have to install pg_buffercache and pg_prewarm
to use pg_hibernator", the answer is no. pg_hibernator is a stand-alone
extension, although influenced by pg_buffercache and pg_prewarm.

With `pg_prewarm` you can load blocks of **only** the database
you're connected
to. So if you have `N` databases in your cluster, to restore blocks of all
databases, the DBA will have to connect to each database and invoke
`pg_prewarm` functions.

With `pg_hibernator`, DBA isn't required to do anything, let alone
connecting to the database!

- Where can I learn more about it?

There are a couple of blog posts and initial proposal to Postgres
hackers' mailing list. They may provide a better understanding of
Postgres Hibernator.

[Proposal](http://www.postgresql.org/message-id/cabwtf4ui_anag+ybsefunah5z6de9aw2npdy4hryk+m5odx...@mail.gmail.com)

[Introducing Postrges
Hibernator](http://gurjeet.singh.im/blog/2014/02/03/introducing-postgres-hibernator/)

[Demostrating Performance
Benefits](http://gurjeet.singh.im/blog/2014/04/30/postgres-hibernator-reduce-planned-database-down-times/)

On Mon, Feb 3, 2014 at 7:18 PM, Gurjeet Singh  wrote:
> Please find attached the pg_hibernate extension. It is a
> set-it-and-forget-it solution to enable hibernation of Postgres
> shared-buffers. It can be thought of as an amalgam of pg_buffercache and
> pg_prewarm.
>
> It uses the background worker infrastructure. It registers one worker
> process (BufferSaver) to save the shared-buffer metadata when server is
> shutting down, and one worker per database (BlockReader) when restoring the
> shared buffers.
>
> It stores the buffer metadata under $PGDATA/pg_database/, one file per
> database, and one separate file for global objects. It sorts the list of
> buffers before storage, so that when it encounters a range of consecutive
> blocks of a relation's fork, it stores that range as just one entry, hence
> reducing the storage and I/O overhead.
>
> On-disk binary format, which is used to create the per-database save-files,
> is defined as:
> 1. One or more relation filenodes; stored as r.
> 2. Each realtion is followed by one or more fork number; stored as
&

Re: [HACKERS] review: Non-recursive processing of AND/OR lists

2014-04-24 Thread Gurjeet Singh

On Thu, Jul 18, 2013 at 1:54 PM, Gurjeet Singh  wrote:

> On Thu, Jul 18, 2013 at 10:19 AM, Tom Lane  wrote:
>
>>
>> > Because simpler code is less likely to have bugs and is easier to
>> > maintain.
>>
>> I agree with that point, but one should also remember Polya's Inventor's
>> Paradox: the more general problem may be easier to solve.  That is, if
>> done right, code that fully flattens an AND tree might actually be
>> simpler than code that does just a subset of the transformation.  The
>> current patch fails to meet this expectation,
>
>
> The current patch does completely flatten any type of tree
> (left/right-deep or bushy) without recursing, and right-deep and bushy tree
> processing is what Robert is recommending to defer to recursive processing.
> Maybe I haven't considered a case where it doesn't flatten the tree; do you
> have an example in mind.
>
>
>> but maybe you just haven't
>> thought about it the right way.
>>
>
I tried to eliminate the 'pending' list, but I don't see a way around it.
We need temporary storage somewhere to store the branches encountered on
the right; in recursion case the call stack was serving that purpose.


>
>> My concerns about this patch have little to do with that, though, and
>> much more to do with the likelihood that it breaks some other piece of
>> code that is expecting AND/OR to be strictly binary operators, which
>> is what they've always been in parsetrees that haven't reached the
>> planner.  It doesn't appear to me that you've done any research on that
>> point whatsoever
>
>
> No, I haven't, and I might not be able to research it for a few more weeks.
>

There are about 30 files (including optimizer and executor) that match
case-insensitive search for BoolExpr, and I scanned those for the usage of
the member 'args'. All the instances where BoolExpr.args is being accessed,
it's being treated as a null-terminated list. There's one exception that I
could find, and it was in context of NOT expression: not_clause() in
clauses.c.


>
>
>> you have not even updated the comment for BoolExpr
>> (in primnodes.h) that this patch falsifies.
>>
>
> I will fix that.
>

I think this line in that comment already covers the fact that in some
"special" cases a BoolExpr may have more than 2 arguments.

"There are also a few special cases where more arguments can appear before
optimization."

I have updated the comment nevertheless, and removed another comment in
parse_expr.c that claimed to be the only place where a BoolExpr with more
than 2 args is generated.

I have isolated the code for right-deep and bushy tree processing via the
macro PROCESS_BUSHY_TREES. Also, I have shortened some variable names while
retaining their meaning.

Please find the updated patch attached (based on master).

Best regards,
-- 
Gurjeet Singh http://gurjeet.singh.im/

EDB www.EnterpriseDB.com <http://www.enterprisedb.com>
diff --git a/src/backend/parser/parse_expr.c b/src/backend/parser/parse_expr.c
index 81c9338..eb35d70 100644
--- a/src/backend/parser/parse_expr.c
+++ b/src/backend/parser/parse_expr.c
@@ -41,8 +41,7 @@ bool		Transform_null_equals = false;
 static Node *transformExprRecurse(ParseState *pstate, Node *expr);
 static Node *transformParamRef(ParseState *pstate, ParamRef *pref);
 static Node *transformAExprOp(ParseState *pstate, A_Expr *a);
-static Node *transformAExprAnd(ParseState *pstate, A_Expr *a);
-static Node *transformAExprOr(ParseState *pstate, A_Expr *a);
+static Node *transformAExprAndOr(ParseState *pstate, A_Expr *a);
 static Node *transformAExprNot(ParseState *pstate, A_Expr *a);
 static Node *transformAExprOpAny(ParseState *pstate, A_Expr *a);
 static Node *transformAExprOpAll(ParseState *pstate, A_Expr *a);
@@ -224,10 +223,10 @@ transformExprRecurse(ParseState *pstate, Node *expr)
 		result = transformAExprOp(pstate, a);
 		break;
 	case AEXPR_AND:
-		result = transformAExprAnd(pstate, a);
+		result = transformAExprAndOr(pstate, a);
 		break;
 	case AEXPR_OR:
-		result = transformAExprOr(pstate, a);
+		result = transformAExprAndOr(pstate, a);
 		break;
 	case AEXPR_NOT:
 		result = transformAExprNot(pstate, a);
@@ -918,32 +917,102 @@ transformAExprOp(ParseState *pstate, A_Expr *a)
 	return result;
 }
 
+/*
+ * Transform the AND/OR trees non-recursively.
+ *
+ * The parser turns a list of consecutive AND expressions into a left-deep tree.
+ *
+ * a AND b AND c
+ *
+ *  AND
+ * /  \
+ *   AND   c
+ *  /  \
+ * ab
+ *
+ * For very long lists, it gets deep enough that processing it recursively causes
+ * check_stack_depth() to raise error and abort the query. Hence, it is necessary
+ * that we proc

Re: [HACKERS] Patch to send transaction commit/rollback stats to the stats collector unconditionally.

2014-03-19 Thread Gurjeet Singh

On Wed, Mar 19, 2014 at 7:27 PM, Tom Lane  wrote:

> Alvaro Herrera  writes:
> > I'm not sure I understand the point of this whole thing.  Realistically,
> > how many transactions are there that do not access any database tables?
>
> I think that something like "select * from pg_stat_activity" might not
> bump any table-access counters, once the relevant syscache entries had
> gotten loaded.  You could imagine that a monitoring app would do a long
> series of those and nothing else (whether any actually do or not is a
> different question).
>

+1. This is exactly what I was doing; querying pg_stat_database from a psql
session, to track transaction counters.

>
> But still, it's a bit hard to credit that this patch is solving any real
> problem.  Where's the user complaints about the existing behavior?
>

Consider my original email a user complaint, albeit with a patch attached.
I was trying to ascertain TPS on a customer's instance when I noticed this
behaviour; it violated POLA. Had I not decided to dig through the code to
find the source of this behaviour, as a regular user I would've reported
this anomaly as a bug, or maybe turned a blind eye to it labeling it as a
quirky behaviour.

> That is, even granting that anybody has a workload that acts like this,
> why would they care ...

To avoid surprises!

This behaviour, at least in my case, causes Postgres to misreport the very
thing I am trying to calculate.

> and are they prepared to take a performance hit
> to avoid the counter jump after the monitoring app exits?
>

It doesn't look like we know how much of a  performance hit this will
cause. I don't see any reasons cited in the code, either. Could this be a
case of premature optimization. The commit log shows that, during the
overhaul (commit 77947c5), this check was put in place to emulate what the
previous code did; code before that commit reported stats, including
transaction stats, only if there were any regular or shared table stats to
report.

Besides, there's already a throttle built in using the PGSTAT_STAT_INTERVAL
limit.

Best regards,
-- 
Gurjeet Singh http://gurjeet.singh.im/

EDB www.EnterpriseDB.com <http://www.enterprisedb.com>

Re: [HACKERS] Patch to send transaction commit/rollback stats to the stats collector unconditionally.

2014-03-19 Thread Gurjeet Singh

On Wed, Mar 19, 2014 at 4:22 PM, Tom Lane  wrote:

> Gurjeet Singh  writes:
> > Please find attached the patch to send transaction commit/rollback stats
> to
> > stats collector unconditionally.
>
> That's intentional to reduce stats traffic.  What kind of performance
> penalty does this patch impose?  If the number of such transactions is
> large enough to create a noticeable jump in the counters, I would think
> that this would be a pretty darn expensive "fix".
>

I can't speak to the performance impact of this patch, except that it would
depend on the fraction of transactions that behave this way. Perhaps the
people who develop and/or aggressively use monitoring can pitch in.

Presumably, on heavily used systems these transactions would form a small
fraction. On relatively idle systems these transactions may be a larger
fraction but that wouldn't affect the users since the database is not under
stress anyway.

Best regards,
-- 
Gurjeet Singh http://gurjeet.singh.im/

EDB www.EnterpriseDB.com <http://www.enterprisedb.com>

[HACKERS] Patch to send transaction commit/rollback stats to the stats collector unconditionally.

2014-03-19 Thread Gurjeet Singh

Please find attached the patch to send transaction commit/rollback stats to
stats collector unconditionally.

Without this patch, the transactions that do not read from/write to a
database table do not get reported to the stats collector until the client
disconnects. Hence the transaction counts in pg_stat_database do not
increment gradually as one would expect them to. But when such a backend
disconnects, the counts jump dramatically, giving the impression that the
database processed a lot of transactions (potentially thousands) in an
instant.

Best regards,
-- 
Gurjeet Singh http://gurjeet.singh.im/

EDB www.EnterpriseDB.com <http://www.enterprisedb.com>
commit 500d11f96b21552872bad4e9655bd45db28efabd
Author: Gurjeet Singh 
Date:   Wed Mar 19 13:41:55 2014 -0400

Send transaction stats to the collector even if no table stats to
report.

diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 3dc280a..47008ed 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -825,11 +825,11 @@ pgstat_report_stat(bool force)
 	}
 
 	/*
-	 * Send partial messages.  If force is true, make sure that any pending
-	 * xact commit/abort gets counted, even if no table stats to send.
+	 * Send partial messages.  Make sure that any pending xact commit/abort gets
+	 * counted, even if no table stats to send.
 	 */
 	if (regular_msg.m_nentries > 0 ||
-		(force && (pgStatXactCommit > 0 || pgStatXactRollback > 0)))
+		force || (pgStatXactCommit > 0 || pgStatXactRollback > 0))
 		pgstat_send_tabstat(®ular_msg);
 	if (shared_msg.m_nentries > 0)
 		pgstat_send_tabstat(&shared_msg);

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cleaner build output when not much has changed

2014-03-11 Thread Gurjeet Singh

On Mon, Mar 10, 2014 at 8:12 PM, Alvaro Herrera wrote:

> Gurjeet Singh wrote:
> > On Tue, Nov 26, 2013 at 12:37 PM, Tom Lane  wrote:
> >
> > > Gurjeet Singh  writes:
> > > > I was looking for ways to reduce the noise in Postgres make output,
> > > > specifically, I wanted to eliminate the "Nothing to be done for
> `all' "
> > > > messages, since they don't add much value, and just ad to the
> clutter.
> > >
> > > Why don't you just use "make -s" if you don't want to see that?
> > > The example output you show is not much less verbose than before.
> >
> > I have a shell function that now adds --no-print-directory to my make
> > command. This patch combined with that switch makes the output really
> clean
> > (at least from my perspective). Since the use of a command-line switch
> can
> > be easily left to personal choice, I am not proposing to add that or its
> > makefile-equivalent. But modifying the makefiles to suppress noise is not
> > that everyone can be expected to do, and do it right.
>
> FWIW you can add a src/Makefile.custom file with this:
>
> all:
> @true
>
> and it will get rid of most noise.
>

As I noted in the first email in this chain, this causes a warning:

GNUmakefile:14: warning: overriding commands for target `all'
/home/gurjeet/dev/POSTGRES/src/Makefile.custom:2: warning: ignoring old
commands for target `all'

I have since settled for `make -s`. On slow builds it keeps me guessing for
a long time, without any indication of progress, but I've learned to live
with that.

Best regards,
-- 
Gurjeet Singh http://gurjeet.singh.im/

EDB www.EnterpriseDB.com <http://www.enterprisedb.com>

[HACKERS] Proposing pg_hibernate

2014-02-03 Thread Gurjeet Singh

Please find attached the pg_hibernate extension. It is a
set-it-and-forget-it solution to enable hibernation of Postgres
shared-buffers. It can be thought of as an amalgam of pg_buffercache and
pg_prewarm.

It uses the background worker infrastructure. It registers one worker
process (BufferSaver) to save the shared-buffer metadata when server is
shutting down, and one worker per database (BlockReader) when restoring the
shared buffers.

It stores the buffer metadata under $PGDATA/pg_database/, one file per
database, and one separate file for global objects. It sorts the list of
buffers before storage, so that when it encounters a range of consecutive
blocks of a relation's fork, it stores that range as just one entry, hence
reducing the storage and I/O overhead.

On-disk binary format, which is used to create the per-database save-files,
is defined as:
1. One or more relation filenodes; stored as r.
2. Each realtion is followed by one or more fork number; stored as
f
3. Each fork number is followed by one or more block numbers; stored as
b
4. Each block number is followed by zero or more range numbers; stored as
N

  {r {f {b N* }+ }+ }+

Possible enhancements:
- Ability to save/restore only specific databases.
- Control how many BlockReaders are active at a time; to avoid I/O storms.
- Be smart about lowered shared_buffers across the restart.
- Different modes of reading like pg_prewarm does.
- Include PgFincore functionality, at least for Linux platforms.

The extension currently works with PG 9.3, and may work on 9.4 without any
changes; I haven't tested, though.  If not, I think it'd be easy to port to
HEAD/PG 9.4. I see that 9.4 has put a cap on maximum background workers via
a GUC, and since my aim is to provide a non-intrusive no-tuning-required
extension, I'd like to use the new dynamic-background-worker infrastructure
in 9.4, which doesn't seem to have any preset limits (I think it's limited
by max_connections, but I may be wrong). I can work on 9.4 port, if there's
interest in including this as a contrib/ module.

To see the extension in action:

.) Compile it.
.) Install it.
.) Add it to shared_preload_libraries.
.) Start/restart Postgres.
.) Install pg_buffercache extension, to inspect the shared buffers.
.) Note the result of pg_buffercache view.
.) Work on your database to fill up the shared buffers.
.) Note the result of pg_buffercache view, again; there should be more
blocks than last time we checked.
.) Stop and start the Postgres server.
.) Note the output of pg_buffercache view; it should contain the blocks
seen just before the shutdown.
.) Future server restarts will automatically save and restore the blocks in
shared-buffers.

The code is also available as Git repository at
https://github.com/gurjeet/pg_hibernate/

Demo:

$ make -C contrib/pg_hibernate/
$ make -C contrib/pg_hibernate/ install
$ vi $B/db/data/postgresql.conf
$ grep shared_preload_libraries $PGDATA/postgresql.conf
shared_preload_libraries = 'pg_hibernate'   # (change requires restart)

$ pgstart
waiting for server to start done
server started

$ pgsql -c 'create extension pg_buffercache;'
CREATE EXTENSION

$ pgsql -c 'select count(*) from pg_buffercache where relblocknumber is not
null group by reldatabase;'
 count
---
   163
14
(2 rows)

$ pgsql -c 'create table test_hibernate as select s as a, s::char(1000) as
b from generate_series(1, 10) as s;'
SELECT 10

$ pgsql -c 'create index on test_hibernate(a);'
CREATE INDEX

$ pgsql -c 'select count(*) from pg_buffercache where relblocknumber is not
null group by reldatabase;'
 count
---
  2254
14
(2 rows)

$ pgstop
waiting for server to shut down... done
server stopped

$ pgstart
waiting for server to start done
server started

$ pgsql -c 'select count(*) from pg_buffercache where relblocknumber is not
null group by reldatabase;'
 count
---
  2264
17
(2 rows)

There are a few more blocks than the time they were saved, but all the
blocks from before the restart are present in shared buffers after the
restart.

Best regards,
-- 
Gurjeet Singh http://gurjeet.singh.im/

EDB www.EnterpriseDB.com <http://www.enterprisedb.com>


pg_hibernate.tgz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Minor improvements to sslinfo contrib module

2014-01-17 Thread Gurjeet Singh

Please find attached the patch that fixes a couple of comments, and adds
'static' qualifier to functions that are not used anywhere else in the code
base.

Best regards,
-- 
Gurjeet Singh http://gurjeet.singh.im/

EDB www.EnterpriseDB.com <http://www.enterprisedb.com>
diff --git a/contrib/sslinfo/sslinfo.c b/contrib/sslinfo/sslinfo.c
index 7a58470..18405f6 100644
--- a/contrib/sslinfo/sslinfo.c
+++ b/contrib/sslinfo/sslinfo.c
@@ -31,9 +31,10 @@ Datum		ssl_client_dn_field(PG_FUNCTION_ARGS);
 Datum		ssl_issuer_field(PG_FUNCTION_ARGS);
 Datum		ssl_client_dn(PG_FUNCTION_ARGS);
 Datum		ssl_issuer_dn(PG_FUNCTION_ARGS);
-Datum		X509_NAME_field_to_text(X509_NAME *name, text *fieldName);
-Datum		X509_NAME_to_text(X509_NAME *name);
-Datum		ASN1_STRING_to_text(ASN1_STRING *str);
+
+static Datum	X509_NAME_field_to_text(X509_NAME *name, text *fieldName);
+static Datum	X509_NAME_to_text(X509_NAME *name);
+static Datum	ASN1_STRING_to_text(ASN1_STRING *str);
 
 
 /*
@@ -51,7 +52,7 @@ ssl_is_used(PG_FUNCTION_ARGS)
 
 
 /*
- * Returns SSL cipher currently in use.
+ * Returns SSL version currently in use.
  */
 PG_FUNCTION_INFO_V1(ssl_version);
 Datum
@@ -77,7 +78,7 @@ ssl_cipher(PG_FUNCTION_ARGS)
 
 
 /*
- * Indicates whether current client have provided a certificate
+ * Indicates whether current client provided a certificate
  *
  * Function has no arguments.  Returns bool.  True if current session
  * is SSL session and client certificate is verified, otherwise false.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Shave a few instructions from child-process startup sequence

2013-12-01 Thread Gurjeet Singh

On Wed, Nov 27, 2013 at 9:12 AM, Robert Haas  wrote:

>
> This is a performance patch, so it should come with benchmark results
> demonstrating that it accomplishes its intended purpose.  I don't see
> any.
>

Yes, this is a performance patch, but as the subject says, it saves a few
instructions. I don't know how to write a test case that can measure
savings of skipping a few instructions in a startup sequence that
potentially takes thousands, or even millions, of instructions.

Best regards,
-- 
Gurjeet Singh http://gurjeet.singh.im/

EDB Corp. www.EnterpriseDB.com <http://www.enterprisedb.com>

Re: [HACKERS] Shave a few instructions from child-process startup sequence

2013-11-26 Thread Gurjeet Singh

On Tue, Nov 26, 2013 at 9:42 PM, Peter Eisentraut  wrote:

> src/backend/postmaster/postmaster.c:2255: indent with spaces.
> +else
> src/backend/postmaster/postmaster.c:2267: indent with spaces.
> +break


Not sure how that happened! Attached is the updated patch.

Best regards,
-- 
Gurjeet Singh http://gurjeet.singh.im/

EnterprsieDB Inc. www.enterprisedb.com
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index ccb8b86..cdae6e5 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -2236,6 +2236,20 @@ ClosePostmasterPorts(bool am_syslogger)
 			StreamClose(ListenSocket[i]);
 			ListenSocket[i] = PGINVALID_SOCKET;
 		}
+		else
+		{
+			/*
+			 * Do not process the rest of the array elements since we expect
+			 * the presence of an invalid socket id to mark the end of valid
+			 * elements.
+			 */
+#ifdef USE_ASSERT_CHECKING
+			int j;
+			for(j = i; j < MAXLISTEN; j++)
+Assert(ListenSocket[i] == PGINVALID_SOCKET);
+#endif
+			break;
+		}
 	}
 
 	/* If using syslogger, close the read side of the pipe */

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cleaner build output when not much has changed

2013-11-26 Thread Gurjeet Singh

On Tue, Nov 26, 2013 at 12:37 PM, Tom Lane  wrote:

> Gurjeet Singh  writes:
> > I was looking for ways to reduce the noise in Postgres make output,
> > specifically, I wanted to eliminate the "Nothing to be done for `all' "
> > messages, since they don't add much value, and just ad to the clutter.
>
> Why don't you just use "make -s" if you don't want to see that?
> The example output you show is not much less verbose than before.
>

I have a shell function that now adds --no-print-directory to my make
command. This patch combined with that switch makes the output really clean
(at least from my perspective). Since the use of a command-line switch can
be easily left to personal choice, I am not proposing to add that or its
makefile-equivalent. But modifying the makefiles to suppress noise is not
that everyone can be expected to do, and do it right.

> I'm pretty suspicious of cute changes like this to the makefiles.
> They too often have unexpected side-effects.  (I'm still pissed off
> about having to manually remove objfiles.txt to get it to rebuild a .o
> file, for instance.)
>

You mean the --enable-depend switch to ./configure is not sufficient to
force a rebuild on changed source code! With this switch, I have always
seen my builds do the right thing, whether I modify a .c file or a .h. I
personally have never been forced to remove objfiles.txt to solve a
build/make issue. (I primarily use VPATH builds, BTW).

Best regards,
-- 
Gurjeet Singh http://gurjeet.singh.im/

EDB www.EnterpriseDB.com <http://www.enterprisedb.com>

[HACKERS] Cleaner build output when not much has changed

2013-11-26 Thread Gurjeet Singh

I was looking for ways to reduce the noise in Postgres make output,
specifically, I wanted to eliminate the "Nothing to be done for `all' "
messages, since they don't add much value, and just ad to the clutter.

Most of the solutions I have seen propose grepping out the noisy parts. But
one of them proposed adding a .PHONY rule and then adding a no-op command
to a target's recipe. Attached is a small patch to a few makefiles which
helps remove the above mentioned message. Following is a sample output:

...
make[3]: Entering directory
`/home/gurjeet/dev/pgdbuilds/quiet_make/src/bin/initdb'
make -C ../../../src/port all
make[4]: Entering directory
`/home/gurjeet/dev/pgdbuilds/quiet_make/src/port'
make -C ../backend submake-errcodes
make[5]: Entering directory
`/home/gurjeet/dev/pgdbuilds/quiet_make/src/backend'
make[5]: Leaving directory
`/home/gurjeet/dev/pgdbuilds/quiet_make/src/backend'
make[4]: Leaving directory `/home/gurjeet/dev/pgdbuilds/quiet_make/src/port'
make -C ../../../src/common all
make[4]: Entering directory
`/home/gurjeet/dev/pgdbuilds/quiet_make/src/common'
make -C ../backend submake-errcodes
make[5]: Entering directory
`/home/gurjeet/dev/pgdbuilds/quiet_make/src/backend'
make[5]: Leaving directory
`/home/gurjeet/dev/pgdbuilds/quiet_make/src/backend'
make[4]: Leaving directory
`/home/gurjeet/dev/pgdbuilds/quiet_make/src/common'
make[3]: Leaving directory
`/home/gurjeet/dev/pgdbuilds/quiet_make/src/bin/initdb'
make[2]: Leaving directory `/home/gurjeet/dev/pgdbuilds/quiet_make/src/bin'
make -C pl all
make[2]: Entering directory `/home/gurjeet/dev/pgdbuilds/quiet_make/src/pl'
...

The noise can be further reduced by adding the --no-print-directory switch,
which yeilds the following output:

...
make -C ../backend submake-errcodes
make -C psql all
make -C ../../../src/interfaces/libpq all
make -C ../../../src/port all
make -C ../backend submake-errcodes
make -C ../../../src/common all
make -C ../backend submake-errcodes
make -C scripts all
make -C ../../../src/interfaces/libpq all
make -C ../../../src/port all
...

The only downside I see to this patch is that it emits this warning in the
beginning:

GNUmakefile:14: warning: overriding commands for target `all'
src/Makefile.global:29: warning: ignoring old commands for target `all'

This is from the recipe that emits the message "All of PostgreSQL
successfully made. Ready to install."

For really quiet builds one can use the -s switch, but for someone who
wishes to see some kind of progress and also want a cleaner terminal
output, the --no-print-directory switch alone is not enough.

Best regards,
-- 
Gurjeet Singh http://gurjeet.singh.im/

EDB www.EnterpriseDB.com <http://www.enterprisedb.com>
diff --git a/src/Makefile.global.in b/src/Makefile.global.in
index 8bfb77d..d0928d5 100644
--- a/src/Makefile.global.in
+++ b/src/Makefile.global.in
@@ -26,6 +26,7 @@ standard_always_targets = distprep clean distclean maintainer-clean
 
 # make `all' the default target
 all:
+	@true
 
 # Delete target files if the command fails after it has
 # started to update the file.
diff --git a/src/backend/Makefile b/src/backend/Makefile
index 318cdbc..cbc6d55 100644
--- a/src/backend/Makefile
+++ b/src/backend/Makefile
@@ -124,6 +124,7 @@ submake-schemapg:
 
 # src/port needs a convenient way to force errcodes.h to get built
 submake-errcodes: $(top_builddir)/src/include/utils/errcodes.h
+	@true
 
 .PHONY: submake-schemapg submake-errcodes
 
diff --git a/src/backend/catalog/Makefile b/src/backend/catalog/Makefile
index c4d3f3c..ee8fbcc 100644
--- a/src/backend/catalog/Makefile
+++ b/src/backend/catalog/Makefile
@@ -55,7 +55,9 @@ postgres.description: postgres.bki ;
 
 postgres.shdescription: postgres.bki ;
 
+.PHONY: schemapg.h
 schemapg.h: postgres.bki ;
+	@true
 
 # Technically, this should depend on Makefile.global, but then
 # postgres.bki would need to be rebuilt after every configure run,

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Shave a few instructions from child-process startup sequence

2013-11-22 Thread Gurjeet Singh

On Wed, Nov 20, 2013 at 10:21 AM, Peter Eisentraut  wrote:

> On 11/5/13, 2:47 AM, Gurjeet Singh wrote:
> > On Mon, Nov 4, 2013 at 12:20 AM, Tom Lane  > <mailto:t...@sss.pgh.pa.us>> wrote:
> >
> > But we're not buying much.  A few instructions during postmaster
> > shutdown
> > is entirely negligible.
> >
> >
> > The patch is for ClosePostmasterPorts(), which is called from every
> > child process startup sequence (as $subject also implies), not in
> > postmaster shutdown. I hope that adds some weight to the argument.
>
> If there is a concern about future maintenance, you could add assertions
> (in appropriate compile mode) that the rest of the array is indeed
> PGINVALID_SOCKET.  I think that could be a win for both performance and
> maintainability.
>

Makes sense! Does the attached patch look like what you expected? I also
added a comment to explain the expectation.

Thanks and best regards,
-- 
Gurjeet Singh http://gurjeet.singh.im/

EDB Inc. www.EnterpriseDB.com <http://www.enterprisedb.com>
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index ccb8b86..9efc9fa 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -2236,6 +2236,20 @@ ClosePostmasterPorts(bool am_syslogger)
 			StreamClose(ListenSocket[i]);
 			ListenSocket[i] = PGINVALID_SOCKET;
 		}
+else
+		{
+			/*
+			 * Do not process the rest of the array elements since we expect
+			 * the presence of an invalid socket id to mark the end of valid
+			 * elements.
+			 */
+#ifdef USE_ASSERT_CHECKING
+			int j;
+			for(j = i; j < MAXLISTEN; j++)
+Assert(ListenSocket[i] == PGINVALID_SOCKET);
+#endif
+break;
+		}
 	}
 
 	/* If using syslogger, close the read side of the pipe */

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proof of concept: standalone backend with full FE/BE protocol

2013-11-20 Thread Gurjeet Singh

On Wed, Nov 20, 2013 at 3:44 PM, Tom Lane  wrote:

>
> To my mind, the "create a socket and hope nobody else can get to it"
> approach is exactly one of the main things we're trying to avoid here.
> If you'll recall, awhile back we had a big discussion about how pg_upgrade
> could positively guarantee that nobody messed with the source database
> while it was working, and we still don't have a bulletproof guarantee
> there.  I would like to fix that by making pg_upgrade use only standalone
> backends to talk to the source database, never starting a real postmaster
> at all.  But if the standalone-pg_dump mode goes through a socket, we're
> back to square one on that concern.
>

(I couldn't find the pg_upgrade-related thread mentioned above).

I am not sure of the mechanics of this, but can we not launch the
postmaster with a random magic-cookie, and use that cookie while initiating
the connection from libpq. The postmaster will then reject any connections
that don't provide the cookie.

We do something similar to enable applications to send cancellation signals
(postmaster.c:Backend.cancel_key), just that it's establishing trust in the
opposite direction.

Best regards,
-- 
Gurjeet Singh http://gurjeet.singh.im/

EnterprsieDB Inc. www.enterprisedb.com

Re: [HACKERS] Shave a few instructions from child-process startup sequence

2013-11-04 Thread Gurjeet Singh

On Mon, Nov 4, 2013 at 12:20 AM, Tom Lane  wrote:

> But we're not buying much.  A few instructions during postmaster shutdown
> is entirely negligible.
>

The patch is for ClosePostmasterPorts(), which is called from every child
process startup sequence (as $subject also implies), not in postmaster
shutdown. I hope that adds some weight to the argument.

Best regards,
-- 
Gurjeet Singh   gurjeet.singh.im
EnterpriseDB Inc. www.enterprisedb.com

Re: [HACKERS] Shave a few instructions from child-process startup sequence

2013-11-01 Thread Gurjeet Singh

On Thu, Oct 31, 2013 at 11:20 PM, Tom Lane  wrote:

> Amit Kapila  writes:
> > On Thu, Oct 31, 2013 at 2:41 AM, Gurjeet Singh  wrote:
> >> Just a small patch; hopefully useful.
>
> > This is valid saving as we are filling array ListenSocket[] in
> > StreamServerPort() serially, so during ClosePostmasterPorts() once if
> > it encountered PGINVALID_SOCKET, it is valid to break the loop.
> > Although savings are small considering this doesn't occur in any
> > performance path, still I think this is right thing to do in code.
>
> > It is better to register this patch in CF app list, unless someone
> > feels this is not right.
>
> I think this is adding fragility for absolutely no meaningful savings.
> The existing code does not depend on the assumption that the array
> is filled consecutively and no entries are closed early.  Why should
> we add such an assumption here?
>

IMHO, typical configurations have one, or maybe two of these array elements
populated; one for TCP 5432 (for localhost or *), and the other for Unix
Domain Sockets. Saving 62 iterations and comparisons in startup sequence
may not be much, but IMHO it does match logic elsewhere.

In fact, this was inspired by the following code in ServerLoop():

ServerLoop()
{
...
/*
 * New connection pending on any of our sockets? If so, fork a child
 * process to deal with it.
 */
if (selres > 0)
{
int i;

for (i = 0; i < MAXLISTEN; i++)
{
if (ListenSocket[i] == PGINVALID_SOCKET)
break;
if (FD_ISSET(ListenSocket[i], &rmask))
{

And looking for other precedences, I found that initMasks() function also
relies on the array being filled consecutively and the first
PGINVALID_SOCKET marks end of valid array members.

Best regards,
-- 
Gurjeet Singh

http://gurjeet.singh.im/

EnterpriseDB Inc.

[HACKERS] Shave a few instructions from child-process startup sequence

2013-10-30 Thread Gurjeet Singh

Just a small patch; hopefully useful.

Best regards,
-- 
Gurjeet Singh

http://gurjeet.singh.im/

EnterpriseDB Inc.
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index ccb8b86..48dc7af 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -2236,6 +2236,8 @@ ClosePostmasterPorts(bool am_syslogger)
 			StreamClose(ListenSocket[i]);
 			ListenSocket[i] = PGINVALID_SOCKET;
 		}
+else
+break;
 	}
 
 	/* If using syslogger, close the read side of the pipe */

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] review: Non-recursive processing of AND/OR lists

2013-07-18 Thread Gurjeet Singh

On Thu, Jul 18, 2013 at 10:19 AM, Tom Lane  wrote:

>
> > Because simpler code is less likely to have bugs and is easier to
> > maintain.
>
> I agree with that point, but one should also remember Polya's Inventor's
> Paradox: the more general problem may be easier to solve.  That is, if
> done right, code that fully flattens an AND tree might actually be
> simpler than code that does just a subset of the transformation.  The
> current patch fails to meet this expectation,

The current patch does completely flatten any type of tree (left/right-deep
or bushy) without recursing, and right-deep and bushy tree processing is
what Robert is recommending to defer to recursive processing. Maybe I
haven't considered a case where it doesn't flatten the tree; do you have an
example in mind.

> but maybe you just haven't
> thought about it the right way.
>
> My concerns about this patch have little to do with that, though, and
> much more to do with the likelihood that it breaks some other piece of
> code that is expecting AND/OR to be strictly binary operators, which
> is what they've always been in parsetrees that haven't reached the
> planner.  It doesn't appear to me that you've done any research on that
> point whatsoever

No, I haven't, and I might not be able to research it for a few more weeks.

> you have not even updated the comment for BoolExpr
> (in primnodes.h) that this patch falsifies.
>

I will fix that.

Best regards,
-- 
Gurjeet Singh

http://gurjeet.singh.im/

EnterpriseDB Inc.

Re: [HACKERS] review: Non-recursive processing of AND/OR lists

2013-07-17 Thread Gurjeet Singh

On Wed, Jul 17, 2013 at 1:25 PM, Robert Haas  wrote:

> On Mon, Jul 15, 2013 at 12:45 AM, Gurjeet Singh  wrote:
> > Agreed that there's overhead in allocating list items, but is it more
> > overhead than pushing functions on the call stack? Not sure, so I leave
> it
> > to others who understand such things better than I do.
>
> If you think that a palloc can ever be cheaper that pushing a frame on
> the callstack, you're wrong.  palloc is not some kind of an atomic
> primitive.  It's implemented by the AllocSetAlloc function, and you're
> going to have to push that function on the call stack, too, in order
> to run it.
>

Agreed. I take my objection back. Even if AllocSetAlloc() reuses memory
that was pfree'd earlier, it'll still be at least as expensive as recursing.

>
> My main point here is that if the user writes a = 1 and b = 1 and c =
> 1 and d = 1, they're not going to end up with a bushy tree.  They're
> going to end up with a tree that's only deep in one direction (left, I
> guess) and that's the case we might want to consider optimizing.  To
> obtain a bushy tree, they're going to have to write a  = 1 and (b = 1
> and c = 1) and d = 1, or something like that, and I don't see why we
> should stress out about that case.  It will be rare in practice.
>

In v6 of the  patch, I have deferred the 'pending' list initialization to
until we actually hit a candidate right-branch. So in the common case the
pending list will never be populated, and if we find a bushy or right-deep
tree (for some reason an ORM/tool may choose to build AND/OR lists that may
end being right-deep when in Postgres), then the pending list will be used
to process them iteratively.

Does that alleviate your concern about 'pending' list management causing an
overhead.

Agreed that bushy/right-deep trees are a remote corner case, but we are
addressing a remote corner case in the first place (insanely long AND
lists) and why not handle another remote corner case right now if it
doesn't cause an overhead for common case.

Best regards,
-- 
Gurjeet Singh

http://gurjeet.singh.im/

EnterpriseDB Inc.

Re: [HACKERS] review: Non-recursive processing of AND/OR lists

2013-07-17 Thread Gurjeet Singh

On Wed, Jul 17, 2013 at 8:21 AM, Gurjeet Singh  wrote:

>
> What's the procedure of moving a patch to the next commitfest?
>

Never mind, I see an email from Josh B. regarding this on my corporate
account.

Best regards,
-- 
Gurjeet Singh

http://gurjeet.singh.im/

EnterpriseDB Inc.

Re: [HACKERS] review: Non-recursive processing of AND/OR lists

2013-07-17 Thread Gurjeet Singh

On Tue, Jul 16, 2013 at 4:04 PM, Pavel Stehule wrote:

> I did a some performance tests of v5 and v6 version and there v5 is
> little bit faster than v6, and v6 has significantly higher stddev
>

Thanks Pavel.

The difference in average seems negligible, but stddev is interesting
because v6 does less work than v5 in common cases and in the test that I
had shared.

The current commitfest (2013-06) is marked as 'In Progress', so is it okay
to just mark the patch as 'Ready for Committer' or should I move it to the
next commitfest (2013-09).

What's the procedure of moving a patch to the next commitfest? Do I make a
fresh submission there with a link to current submission, or is the move
doable somehow in the application itself.

Best regards,
-- 
Gurjeet Singh

http://gurjeet.singh.im/

EnterpriseDB Inc.

Re: [HACKERS] review: Non-recursive processing of AND/OR lists

2013-07-14 Thread Gurjeet Singh

On Sun, Jul 14, 2013 at 8:27 PM, Robert Haas  wrote:

> On Wed, Jul 10, 2013 at 9:02 PM, Josh Berkus  wrote:
> >> I think it's a waste of code to try to handle bushy trees.  A list is
> >> not a particularly efficient representation of the pending list; this
> >> will probably be slower than recusing in the common case.  I'd suggest
> >> keeping the logic to handle left-deep trees, which I find rather
> >> elegant, but ditching the pending list.
>

Somehow I find it hard to believe that recursing would be more efficient
than processing the items right there. The recursion is not direct either;
transformExprRecurse() is going to call this function again, but after a
few more switch-case comparisons.

Agreed that there's overhead in allocating list items, but is it more
overhead than pushing functions on the call stack? Not sure, so I leave it
to others who understand such things better than I do.

If by common-case you mean a list of just one logical AND/OR operator, then
I agree that creating and destroying a list may incur overhead that is
relatively very expensive. To that end, I have altered the patch, attached,
to not build a pending list until we encounter a node with root_expr_kind
in a right branch.

We're getting bushy-tree processing with very little extra code, but if you
deem it not worthwhile or adding complexity, please feel free to rip it out.

>  >
> > Is there going to be further discussion of this patch, or do I return it?
>
> Considering it's not been updated, nor my comments responded to, in
> almost two weeks, I think we return it at this point.
>

Sorry, I didn't notice that this patch was put back in  'Waiting on Author'
state.

Best regards,
-- 
Gurjeet Singh

http://gurjeet.singh.im/

EnterpriseDB Inc.

non_recursive_and_or_transformation_v6.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Fwd: review: Non-recursive processing of AND/OR lists

2013-06-30 Thread Gurjeet Singh

On Sun, Jun 30, 2013 at 11:46 AM, Pavel Stehule wrote:

> 2013/6/30 Gurjeet Singh :
> > On Sun, Jun 30, 2013 at 11:13 AM, Pavel Stehule  >
> > wrote:
> >
> > How about naming those 3 variables as follows:
> >
> > root_expr_kind
> > root_expr_name
> > root_bool_expr_type
>
> +1


Thanks. Attached is the patch with that change. I'll update the commitfest
entry with a link to this email.

-- 
Gurjeet Singh

http://gurjeet.singh.im/

EnterpriseDB Inc.



-- 
Gurjeet Singh

http://gurjeet.singh.im/

EnterpriseDB Inc.


non_recursive_and_or_transformation_v5.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] review: Non-recursive processing of AND/OR lists

2013-06-30 Thread Gurjeet Singh

On Sun, Jun 30, 2013 at 11:46 AM, Pavel Stehule wrote:

> 2013/6/30 Gurjeet Singh :
> > On Sun, Jun 30, 2013 at 11:13 AM, Pavel Stehule  >
> > wrote:
> >
> > How about naming those 3 variables as follows:
> >
> > root_expr_kind
> > root_expr_name
> > root_bool_expr_type
>
> +1

Thanks. Attached is the patch with that change. I'll update the commitfest
entry with a link to this email.

-- 
Gurjeet Singh

http://gurjeet.singh.im/

EnterpriseDB Inc.

Re: [HACKERS] review: Non-recursive processing of AND/OR lists

2013-06-30 Thread Gurjeet Singh

On Sun, Jun 30, 2013 at 11:13 AM, Pavel Stehule wrote:

> Hello
>
> just one small notices
>
> I dislike a name "root_bool_expr", because, there is not a expression,
> but expression type. Can you use "root_bool_expr_type" instead? It is
> little bit longer, but more correct. Same not best name is
> "root_char", maybe "root_bool_op_name"
>
> or root_expr_type and root_op_name ???
>

How about naming those 3 variables as follows:

root_expr_kind
root_expr_name
root_bool_expr_type


-- 
Gurjeet Singh

http://gurjeet.singh.im/

EnterpriseDB Inc.

Re: [HACKERS] review: Non-recursive processing of AND/OR lists

2013-06-30 Thread Gurjeet Singh

On Tue, Jun 18, 2013 at 3:01 PM, Pavel Stehule wrote:

>
> related to
>
> https://commitfest.postgresql.org/action/patch_view?id=1130
>
> http://www.postgresql.org/message-id/cabwtf4v9rsjibwe+87pk83mmm7acdrg7sz08rq-4qyme8jv...@mail.gmail.com
>
>
> * motivation: remove recursive procession of AND/OR list (hangs with
> 10062 and more subexpressions)
>
> * patch is short, clean and respect postgresql source code requirements
> * patch was applied cleanly without warnings
> * all regression tests was passed
> * I successfully evaluated expression with 10 subexpressions
> * there is no significant slowdown
>
> possible improvements
>
> a = (A_Expr*) list_nth(pending, 0);
>
> a = (A_Expr*) linitial(pending);
>

I made that change, hesitantly. The comments above definition of linitial()
macro describe the confusion that API causes. I wanted to avoid that
confusion for new code, so I used the newer API which makes the intention
quite clear. But looking at that code closely, list_nth() causes at least 2
function calls, and that's pretty heavy compared to the linitiali() macro.

>
> not well comment
>
> should be -- "If the right branch is also an SAME condition, append it to
> the"
>

I moved that comment above the outer bock, so that the intention of the
whole do-while code block is described in one place.

I don't see any other issues, so after fixing comments this patch is
> ready for commit
>

Thanks for the review Pavel.

Attached is the updated patch, v4. It has the above edits, and a few code
improvements, like not repeating the (root_kind == AEPR_AND ? .. :  ..)
ternary expression.

Best regards,
-- 
Gurjeet Singh

http://gurjeet.singh.im/

EnterpriseDB Inc.

non_recursive_and_or_transformation_v4.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Config reload/restart preview

2013-06-20 Thread Gurjeet Singh

On Thu, Jun 20, 2013 at 9:33 AM, Magnus Hagander wrote:

> On Thu, Jun 20, 2013 at 2:54 PM, Dimitri Fontaine
>  wrote:
> > Magnus Hagander  writes:
> >>> Should we have a way of previewing changes that would be applied if we
> >>> reloaded/restarted the server?
> >>
> >> Yes, we should.
> >
> > +1
> >
> >> This would go well with something I started working on some time ago
> >> (but haven't actually gotten far on at all), which is the ability for
> >> pg_ctl to be able to give feedback at all. Meaning a "pg_ctl reload"
> >> should also be able to tell you which parameters were changed, without
> >> having to go to the log. Obviously that's almost exactly the same
> >> feature.
> >
> > It could probably connect to the server and issue the SQL command to
> > reload, and that one could probably get enhanced to return modified
> > variable as NOTICE, or be run with the right client_min_message:
> >
> > SELECT pg_reload_conf();
> >
> > The pg_ctl client would then have to know to display the messages sent
> > back by the server.
>
> The problem with that is that now you must *always* have your system
> set up to allow the postgres user to log in in pg_hba.conf or it
> fails.
>
> But yes, one option would be to use SQL instead of opening a socket.
> Maybe that's a better idea - have pg_ctl try to use that if available,
> and if not send a regular signal and not try to collect the output.
>

I started working on it yesterday after Thom proposed this idea internally
at EDB. The discussion until now doesn't seem to be hostile to a SQL
interface, so attached is a hack attempt, which hopefully will serve at
least as a POC. A sample session is shown below, while changing a few
values in postgresql.conf files.

The central idea is to use the SIGHUP processing function to do the work
for us and report potential changes via DEBUG2, instead of having to write
a new parsing engine. The (GUC-nesting + PGC_S_TEST) is nice to have since
it avoids the current session from adopting the values that are different
in conf file. This approach is susceptible to the fact that the connected
superuser may have its GUC values picked up from user/database/session
level settings (ALTER USER/DATABASE .. SET ; or SET param TO val;).

$ pgsql
Expanded display is used automatically.
psql (9.4devel)
Type "help" for help.

postgres=# show work_mem;
 work_mem
--
 1MB
(1 row)

postgres=# set client_min_messages = debug2;
SET
postgres=# select pg_test_reload_conf();
DEBUG:  parameter "work_mem" changed to "70MB"
 pg_test_reload_conf
-
 t
(1 row)

postgres=# show work_mem;
 work_mem
--
 1MB
(1 row)

postgres=# select pg_test_reload_conf();
DEBUG:  parameter "shared_buffers" cannot be changed without restarting the
server
DEBUG:  configuration file
"/home/gurjeet/dev/pgdbuilds/report_guc_chanege_pre_reload/db/data/postgresql.conf"
contains errors; unaffected changes were applied
 pg_test_reload_conf
-
 t
(1 row)

postgres=# select pg_test_reload_conf();
DEBUG:  parameter "log_min_messages" removed from configuration file, reset
to default
 pg_test_reload_conf
-
 t
(1 row)

Best regards,
-- 
Gurjeet Singh

http://gurjeet.singh.im/

EnterpriseDB Inc.


report_conf_changes_without_applying.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] review: Non-recursive processing of AND/OR lists

2013-06-18 Thread Gurjeet Singh

Thanks for the review Pavel.

On Tue, Jun 18, 2013 at 3:01 PM, Pavel Stehule wrote:

> Hello
>
> related to
>
> https://commitfest.postgresql.org/action/patch_view?id=1130
>
> http://www.postgresql.org/message-id/cabwtf4v9rsjibwe+87pk83mmm7acdrg7sz08rq-4qyme8jv...@mail.gmail.com
>
>
> * motivation: remove recursive procession of AND/OR list (hangs with
> 10062 and more subexpressions)
>
> * patch is short, clean and respect postgresql source code requirements
> * patch was applied cleanly without warnings
> * all regression tests was passed
> * I successfully evaluated expression with 10 subexpressions
> * there is no significant slowdown
>
> possible improvements
>
> a = (A_Expr*) list_nth(pending, 0);
>
> a = (A_Expr*) linitial(pending);
>
> not well comment
>
> should be -- "If the right branch is also an SAME condition, append it to
> the"
>
> +   /*
> +* If the right branch is also an AND condition,
> append it to the
> +* pending list, to be processed later. This
> allows us to walk even
> +* bushy trees, not just left-deep trees.
> +*/
> +   if (IsA(a->rexpr, A_Expr) &&
> ((A_Expr*)a->rexpr)->kind == root_kind)
> +   {
> +   pending = lappend(pending, a->rexpr);
> +   }
> +   else
> +   {
> +   expr = transformExprRecurse(pstate,
> a->rexpr);
> +   expr = coerce_to_boolean(pstate, expr,
> root_kind == AEXPR_AND ?
> "AND" : "OR");
> +       exprs = lcons(expr, exprs);
> +   }
>
> I don't see any other issues, so after fixing comments this patch is
> ready for commit
>
> Regards
>
> Pavel Stehule
>



-- 
Gurjeet Singh

http://gurjeet.singh.im/

EnterpriseDB Inc.

Re: [HACKERS] Processing long AND/OR lists

2013-06-06 Thread Gurjeet Singh

On Mon, May 27, 2013 at 10:32 AM, Christopher Browne wrote:

> On Mon, May 27, 2013 at 1:42 AM, Gurjeet Singh  wrote:
>
>>
>>
>>> Joking about "640K" aside, it doesn't seem reasonable to expect a truly
>>> enormous query as is generated by the broken forms of this logic to turn
>>> out happily.  I'd rather fix Slony (as done in the above patch).
>>>
>>
>> Yes, by all means, fix the application, but that doesn't preclude the
>> argument that the database should be a bit more smarter and efficient,
>> especially if it is easy to do.
>
>
> Agreed, it seems like a fine idea to have the database support such
> queries, as this eases coping with applications that might be more
> difficult to get fixed.
>

Seeing no more objections to it, I am going to add this patch to the
commitfest. Attached is updated patch against latest master; it's the same
as the previous version, except that that the patch now includes a fix for
the failing test case as well.

Best regards,
-- 
Gurjeet Singh

http://gurjeet.singh.im/

EnterpriseDB Inc.


non_recursive_and_or_transformation_v3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] pgbench: introduce a new automatic variable 'client_number'

2013-06-05 Thread Gurjeet Singh

Please find attached a patch for pgbench that introduces a new
auto-variable 'client_number'. Following in the footsteps of 'scale'
auto-variable, this is not declared if the user has specified this variable
using -D switch.

Since 'clientid' is a very common name a user can use for their own
script's variable, I chose to call this auto-variable client_number; just
to avoid conflicts.

This variable can come in handy when you want to use a different expression
in a query depending on which client is executing it. An example custom
transaction is attached, where the UPDATE statement from any given client
always updates the same logical row.

Best regards,
-- 
Gurjeet Singh

http://gurjeet.singh.im/

EnterpriseDB Inc.


pgbench_add_cleint_number_variable.patch
Description: Binary data


test_update.sql
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Processing long AND/OR lists

2013-05-26 Thread Gurjeet Singh

My last email was written before reading this. A few episodes of 24
occurred between writing and sending that email.

Added slony1-hackers, but didn't remove pgsql-hackers. Feel free to exclude
pgsql lists, as this branch of conversation seems to be more Slony related
than Postgres.

On Sun, May 26, 2013 at 10:59 PM, Christopher Browne wrote:

>
> In Slony 2.1, the issue re-emerged because the ordering of the "action id"
> values was lost; the query had previously been implicitly forcing them into
> order; we had to add an "ORDER BY" clause, to make the "compressor" work
> again.
>
> http://git.postgresql.org/gitweb/?p=slony1-engine.git;a=blobdiff;f=src/slon/remote_worker.c;h=b1f48043f8e25b4a74a392b0dbceeae8f3e18c27;hp=7fbf67c16f97cb7c3f209cf3be903ea52c4490a9;hb=c4ac435308a78a2db63bf267d401d842c169e87d;hpb=d4612aab78bac5a9836e3e2425c403878f7091c8
>
>
Commit log says it was fixed between  2.1.2, but from the Slony logs at the
time, the version in use was 2.1.2. So

> Joking about "640K" aside, it doesn't seem reasonable to expect a truly
> enormous query as is generated by the broken forms of this logic to turn
> out happily.  I'd rather fix Slony (as done in the above patch).
>

Yes, by all means, fix the application, but that doesn't preclude the
argument that the database should be a bit more smarter and efficient,
especially if it is easy to do.

Best regards,
-- 
Gurjeet Singh

http://gurjeet.singh.im/

EnterpriseDB Inc.

Re: [HACKERS] Processing long AND/OR lists

2013-05-26 Thread Gurjeet Singh

On Sun, May 26, 2013 at 11:46 AM, Tom Lane  wrote:

> Josh Berkus  writes:
> > ***15,000***?  I'd say that someone has an application design issue.
> > Fixing the stack overflow is a good thing, but that query is never going
> > to return ...
>

Just for the record, it does finish in 5 sec on my laptop. But yes, point
taken.

>
> Yeah, the parser's stack consumption seems like only the tip of the
> iceberg here.
>

True. After getting past the parser, the bigger lists consume exponentially
longer times around process_equivalence(). And BTW, a backend stuck in that
stack does not respond to pg_cancel/terminate_backend()! Should there be a
CHECK_FOR_INTERRUPT() in process_equivalence(), perhaps invoked only every
N calls of that function.

>
> I find it hard to visualize a use-case for so many AND'ed conditions
> anyway.  I could believe a lot of OR'd conditions, but very likely it
> would be a list that could and should be converted to an IN condition.
>

As noted earlier, this was seen in real-world, at a customer setup,
generated by Slony, a highly regarded community project. Also, the patch
addresses the stack limit for long OR clauses as well.

Anytime such conditions are generated programmatically, there's always a
possibility that the programmer underestimated the amount of data they are
generating the query for; just like it happened in Slony's case I quoted.

Slony has compress_actionseq() function to scan a list of numbers, and
generate a smallest possible WHERE clause. If it sees consecutive numbers
that form a range, it turns that range into a BETWEEN clause, and the
numbers that don't seem to be in a range are added to the where clause as
'AND col <> n1 AND col <> n2 ...'. It's quite likely that it generates such
long lists because it wrongly assumes that the input list is ordered, or at
least mostly ordered.

Agreed that that function can/should be fixed to do a better job of sorting
these numbers before scanning, and also using the NOT IN construct. But the
point remains that Postgres should be capable of handling this simple
construct efficiently.

Best regards,
-- 
Gurjeet Singh

http://gurjeet.singh.im/

EnterpriseDB Inc.

Re: [HACKERS] Processing long AND/OR lists

2013-05-25 Thread Gurjeet Singh

On Sat, May 25, 2013 at 9:56 AM, Gurjeet Singh  wrote:

> When Postgres encounters a long list of AND/OR chains, it errors out at
> check_stack_depth() after a limit of few thousand. At around 10,000
> elements, the recursion at assign_expr_collations() causes the error. But
> at a little higher element count, around 15,000, the recursion check errors
> out a little earlier, in the stack around transformAExprAnd(). The test
> queries were generated using the attached test.sh script.
>
> This is not a synthetic test. Recently I saw a report where Slony
> generated a huge list of 'log_actionseq <> nnn and log_actionseq <> nnn and
> ...' for a SYNC event, and that SYNC event could not complete due to this
> error. Granted that Slony can be fixed by making it generate the condition
> as 'log_actionseq not in (nnn, nnn)' which is processed non-recursively,
> but IMHO Postgres should be capable of handling this trivial construct,
> however long it may be (of course, limited by memory).
>
> To that end, I wrote the attached patch, and although I seem to be playing
> by the rules, `make check` fails. Some diffs look benign, but others not so
> much.
>
> AIUI, a BoolExpr is perfectly capable of holding multiple expressions as a
> list. And since SQL standard does not say anything about short-circuit
> evaluation, the evaluation order of these expressions should not affect the
> results. So clearly turning a tree of booleans expressions into a list is
> not so subtle a change. I suspect that this patch is messing up the join
> conditions.
>
> I see two ways to fix it:
> 1) Fix the order of expressions in BoolExpr such that the order of
> evaluation remains unchanged compared to before the patch.
> I expect this to take care of the benign diffs. But I doubt this will
> address the other diffs.
>
> 2) Construct a tree same as done before the patch, but do it
> non-recursively.
> This is I guess the right thing to do, but then we'll have to make
> similar tree-traversal changes in other places, for eg. in BoolExpr walking
> in expression_tree_walker() or maybe just the stack below
> assign_expr_collations().
>

As suspected, the problem was that a JOIN's USING clause processing cooks
up its own join conditions, and those were the ones that couldn't cope with
the changed order of output.

Fixing that order of arguments of the BoolExpr also fixed all the JOIN
related diffs. Now `make check` passes except for this benign diff.

  |   WHERE (((rsl.sl_color = rsh.slcolor)
AND (rsl.sl_len_cm >= rsh.slminlen_cm)) AND (rsl.sl_len_cm <=
rsh.slmaxlen_cm));
  |   WHERE ((rsl.sl_color = rsh.slcolor)
AND (rsl.sl_len_cm >= rsh.slminlen_cm) AND (rsl.sl_len_cm <=
rsh.slmaxlen_cm));

That's expected, since for cases like these, the patch converts what used
to be a tree of 2-argument BoolExprs into a single BoolExpr with many
arguments.

-- 
Gurjeet Singh

http://gurjeet.singh.im/

EnterpriseDB Inc.


non_recursive_and_or_transformation_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Processing long AND/OR lists

2013-05-25 Thread Gurjeet Singh

When Postgres encounters a long list of AND/OR chains, it errors out at
check_stack_depth() after a limit of few thousand. At around 10,000
elements, the recursion at assign_expr_collations() causes the error. But
at a little higher element count, around 15,000, the recursion check errors
out a little earlier, in the stack around transformAExprAnd(). The test
queries were generated using the attached test.sh script.

This is not a synthetic test. Recently I saw a report where Slony generated
a huge list of 'log_actionseq <> nnn and log_actionseq <> nnn and ...' for
a SYNC event, and that SYNC event could not complete due to this error.
Granted that Slony can be fixed by making it generate the condition as
'log_actionseq not in (nnn, nnn)' which is processed non-recursively, but
IMHO Postgres should be capable of handling this trivial construct, however
long it may be (of course, limited by memory).

To that end, I wrote the attached patch, and although I seem to be playing
by the rules, `make check` fails. Some diffs look benign, but others not so
much.

AIUI, a BoolExpr is perfectly capable of holding multiple expressions as a
list. And since SQL standard does not say anything about short-circuit
evaluation, the evaluation order of these expressions should not affect the
results. So clearly turning a tree of booleans expressions into a list is
not so subtle a change. I suspect that this patch is messing up the join
conditions.

I see two ways to fix it:
1) Fix the order of expressions in BoolExpr such that the order of
evaluation remains unchanged compared to before the patch.
I expect this to take care of the benign diffs. But I doubt this will
address the other diffs.

2) Construct a tree same as done before the patch, but do it
non-recursively.
This is I guess the right thing to do, but then we'll have to make
similar tree-traversal changes in other places, for eg. in BoolExpr walking
in expression_tree_walker() or maybe just the stack below
assign_expr_collations().

Best regards,
-- 
Gurjeet Singh

http://gurjeet.singh.im/

EnterpriseDB Inc.


non_recursive_and_or_transformation.patch
Description: Binary data


test.sh
Description: Bourne shell script

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch to make pgindent work cleanly

2013-04-12 Thread Gurjeet Singh

On Fri, Apr 12, 2013 at 3:26 PM, Bruce Momjian  wrote:

> On Fri, Apr 12, 2013 at 01:34:49PM -0400, Gurjeet Singh wrote:
> > Can you also improve the output when it dies upon failure to fetch
> something?
> > Currently the only error message it emits is "fetching xyz", and leaves
> the
> > user confused as to what really the problem was. The only indication of a
> > problem might be the exit code,  but I'm not sure of that, and that
> doesn't
> > help the interactive user running it on terminal.
>
> Good point.  I have reviewed all the error messages and improved them
> with the attached, applied patch, e.g.:
>
> cannot locate typedefs file "xyz" at /usr/local/bin/pgindent line
> 121.
>

Thanks!

-- 
Gurjeet Singh

http://gurjeet.singh.im/

EnterpriseDB Inc.

Re: [HACKERS] Patch to make pgindent work cleanly

2013-04-12 Thread Gurjeet Singh

On Fri, Apr 12, 2013 at 11:44 AM, Bruce Momjian  wrote:

> On Tue, Feb 19, 2013 at 04:50:45PM -0500, Gurjeet Singh wrote:
> > Please find attached the patch for some cleanup and fix bit rot in
> pgindent
> > script.
> >
> > There were a few problems with the script.
> >
> > .) It failed to use the $ENV{PGENTAB} even if it was set.
> > .) The file it tries to download from Postgres' ftp site is no longer
> present.
> > ftp://ftp.postgresql.org/pub/dev/indent.netbsd.patched.tgz
> > .) The tar command extracted the above-mentioned file to a child
> directory, but
> > did not descend into it before running make on it.
> > I think it expected a tarbomb, but clearly the current .tgz file on
> ftp
> > site is not a tarbomb.
> >
> > I don't like the fact that it dies with a message "fetching xyz" rather
> than
> > saying "Could not fetch xyz", but this patch does not address that since
> it
> > doesn't really affect the output when script does work.
> >
> > With this patch in place I could very easily run it on any arbitrary
> file,
> > which seemed like a black-magic before the patch.
> >
> > src/tools/pgindent/pgindent --build 
>
> I have applied this patch.  However, I modified the tarball name to
> reference $INDENT_VERSION, rather than hard-coding "1.2".


Thanks!

Can you also improve the output when it dies upon failure to fetch
something? Currently the only error message it emits is "fetching xyz", and
leaves the user confused as to what really the problem was. The only
indication of a problem might be the exit code,  but I'm not sure of that,
and that doesn't help the interactive user running it on terminal.

-- 
Gurjeet Singh

http://gurjeet.singh.im/

EnterpriseDB Inc.

Re: [HACKERS] [DOCS] synchronize_seqscans' description is a bit misleading

2013-04-11 Thread Gurjeet Singh

On Wed, Apr 10, 2013 at 11:56 PM, Tom Lane  wrote:

> Gurjeet Singh  writes:
> > So, again, it is not guaranteed that all the scans on a relation will
> > synchronize with each other. Hence my proposal to include the term
> > 'probability' in the definition.
>
> Yeah, it's definitely not "guaranteed" in any sense.  But I don't really
> think your proposed wording is an improvement.  The existing wording
> isn't promising guaranteed sync either, to my eyes.
>

Given Postgres' track record of delivering what it promises, I expect
casual readers to take that phrase as a definitive guide to what is
happening internally.

>
> Perhaps we could compromise on, say, changing "so that concurrent scans
> read the same block at about the same time" to "so that concurrent scans
> tend to read the same block at about the same time",

Given that, on first read the word "about" did not deter me from assuming
the best, I don't think adding "tend" would make much difference in a
readers (mis)understanding. Perhaps we can spare a few more words to make
more clear.

> or something like
> that.  I don't mind making it sound a bit more uncertain, but I don't
> think that we need to emphasize the probability of failure.
>

I agree we don't want to stress the failure case too much, especially when
it makes the performance no worse than the absence of the feature. But we
don't want the reader to get the wrong idea either.

In addition to the slight doc improvement being suggested, perhaps a
wiki.postgresql.org entry would allow us to explain the behaviour in more
detail.

-- 
Gurjeet Singh

http://gurjeet.singh.im/

EnterpriseDB Inc.

Re: [HACKERS] [DOCS] synchronize_seqscans' description is a bit misleading

2013-04-10 Thread Gurjeet Singh

On Wed, Apr 10, 2013 at 11:10 PM, Tom Lane  wrote:

> Gurjeet Singh  writes:
> > If I'm reading the code right [1], this GUC does not actually
> *synchronize*
> > the scans, but instead just makes sure that a new scan starts from a
> block
> > that was reported by some other backend performing a scan on the same
> > relation.
>
> Well, that's the only *direct* effect, but ...
>
> > Since the backends scanning the relation may be processing the relation
> at
> > different speeds, even though each one took the hint when starting the
> > scan, they may end up being out of sync with each other.
>
> The point you're missing is that the synchronization is self-enforcing:
> whichever backend gets ahead of the others will be the one forced to
> request (and wait for) the next physical I/O.  This will naturally slow
> down the lower-CPU-cost-per-page scans.  The other ones tend to catch up
> during the I/O operation.
>

Got it. So far, so good.

Let's consider a pathological case where a scan is performed by a user
controlled cursor, whose scan speed depends on how fast the user presses
the "Next" button, then this scan is quickly going to fall out of sync with
other scans. Moreover, if a new scan happens to pick up the block reported
by this slow scan, then that new scan may have to read blocks off the disk
afresh.

So, again, it is not guaranteed that all the scans on a relation will
synchronize with each other. Hence my proposal to include the term
'probability' in the definition.

> The feature is not terribly useful unless I/O costs are high compared to
> the CPU cost-per-page.  But when that is true, it's actually rather
> robust.  Backends don't have to have exactly the same per-page
> processing cost, because pages stay in shared buffers for a while after
> the current scan leader reads them.
>

Agreed. Even if the buffer has been evicted from shared_buffers, there's a
high likelihood that the scan that's close on the heels of others will
fetch it from FS cache.

>
> > Imagining that all scans on a table are always synchronized, may make
> some
> > wrongly believe that adding more backends scanning the same table will
> not
> > incur any extra I/O; that is, only one stream of blocks will be read from
> > disk no matter how many backends you add to the mix. I noticed this when
> I
> > was creating partition tables, and each of those was a CREATE TABLE AS
> > SELECT FROM original_table (to avoid WAL generation), and running more
> than
> > 3 such transactions caused the disk read throughput to behave
> unpredictably,
> > sometimes even dipping below 1 MB/s for a few seconds at a stretch.
>
> It's not really the scans that's causing that to be unpredictable, it's
> the write I/O from the output side, which is forcing highly
> nonsequential behavior (or at least I suspect so ... how many disk units
> were involved in this test?)
>

You may be right. I don't have access to the system anymore, and I don't
remember the disk layout, but it's quite possible that write operations
were causing the  read throughput to drop. I did try to reproduce the
behaviour on my laptop with up to 6 backends doing pure reads on a table
that was multiple times the system RAM, but I could not get them to get out
of sync.

-- 
Gurjeet Singh

http://gurjeet.singh.im/

EnterpriseDB Inc.

[HACKERS] synchronize_seqscans' description is a bit misleading

2013-04-10 Thread Gurjeet Singh

If I'm reading the code right [1], this GUC does not actually *synchronize*
the scans, but instead just makes sure that a new scan starts from a block
that was reported by some other backend performing a scan on the same
relation.

Since the backends scanning the relation may be processing the relation at
different speeds, even though each one took the hint when starting the
scan, they may end up being out of sync with each other. Even in a single
query, there may be different scan nodes scanning different parts of the
same relation, and even they don't synchronize with each other (and for
good reason).

Imagining that all scans on a table are always synchronized, may make some
wrongly believe that adding more backends scanning the same table will not
incur any extra I/O; that is, only one stream of blocks will be read from
disk no matter how many backends you add to the mix. I noticed this when I
was creating partition tables, and each of those was a CREATE TABLE AS
SELECT FROM original_table (to avoid WAL generation), and running more than
3 such transactions caused the disk read throughput to behave unpredictably,
sometimes even dipping below 1 MB/s for a few seconds at a stretch.

Please note that I am not complaining about the implementation, which I
think is the best we can do without making backends wait for each other.
It's just that the documentation [2] implies that the scans are
synchronized through the entire run, which is clearly not the case. So I'd
like the docs to be improved to reflect that.

How about something like:


synchronize_seqscans (boolean)
This allows sequential scans of large tables to start from a point in
the table that is already being read by another backend. This increases the
probability that concurrent scans read the same block at about the same
time and hence share the I/O workload. Note that, due to the difference in
speeds of processing the table, the backends may eventually get out of
sync, and hence stop sharing the I/O workload.

When this is enabled, ... The default is on.


Best regards,

[1] src/backend/access/heap/heapam.c
[2]
http://www.postgresql.org/docs/9.2/static/runtime-config-compatible.html#GUC-SYNCHRONIZE-SEQSCANS

-- 
Gurjeet Singh

http://gurjeet.singh.im/

EnterpriseDB Inc.

[HACKERS] One-line comment to improve understanding of VARSIZE_ANY_EXHDR macro

2013-02-19 Thread Gurjeet Singh

Hopefully I am not wrong.


+/* Size of a varlena data, excluding header */
 #define VARSIZE_ANY_EXHDR(PTR) \


-- 
Gurjeet Singh

http://gurjeet.singh.im/

EnterprsieDB Inc.


exhdr_comment.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Patch to make pgindent work cleanly

2013-02-19 Thread Gurjeet Singh

Please find attached the patch for some cleanup and fix bit rot in pgindent
script.

There were a few problems with the script.

.) It failed to use the $ENV{PGENTAB} even if it was set.
.) The file it tries to download from Postgres' ftp site is no longer
present.
ftp://ftp.postgresql.org/pub/dev/indent.netbsd.patched.tgz
.) The tar command extracted the above-mentioned file to a child directory,
but did not descend into it before running make on it.
I think it expected a tarbomb, but clearly the current .tgz file on ftp
site is not a tarbomb.

I don't like the fact that it dies with a message "fetching xyz" rather
than saying "Could not fetch xyz", but this patch does not address that
since it doesn't really affect the output when script does work.

With this patch in place I could very easily run it on any arbitrary file,
which seemed like a black-magic before the patch.

src/tools/pgindent/pgindent --build 

-- 
Gurjeet Singh

http://gurjeet.singh.im/

EnterprsieDB Inc.


pgindent.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fwd: Successful post to pgsql-hackers

2013-02-10 Thread Gurjeet Singh

On Sat, Feb 9, 2013 at 5:09 PM, Euler Taveira  wrote:

> On 09-02-2013 13:45, Gurjeet Singh wrote:
> > BTW, I hope I understand what selfcopy is: send a copy to yourself. Why
> would
> > that be turned on by default?
> >
> If you want to reply to yourself...

Wouldn't I be using the copy in my "sent" folder to do that?

I guess it is a non-issue for me since my mail provider (GMail) does not
show me the mail I sent to myself as a duplicate; it at least seems that
way.

-- 
Gurjeet Singh

http://gurjeet.singh.im/

Re: [HACKERS] Fwd: Successful post to pgsql-hackers

2013-02-09 Thread Gurjeet Singh

I tried the lists page and disabled the 'selfcopy'  since the 'ackpot' was
kinda hidden under the mailing list names. I have updated the settings
again to disable ackpost, and enable the selfcopy. This email will be test
if that worked.

BTW, I hope I understand what selfcopy is: send a copy to yourself. Why
would that be turned on by default?

On Sat, Feb 9, 2013 at 10:24 AM, Magnus Hagander wrote:

> It's in your personal majordomo settings. I don't think it's related
> to that at all, but it's a flag you set on your subscription, so
> perhaps you set it by mistake.
>
> You shold be able to set it from https://mail.postgresql.org. The
> setting you're looking for is "ackpost", and you'll want to turn it
> off.
>
> //Magnus
>
>
> On Sat, Feb 9, 2013 at 4:17 PM, Gurjeet Singh  wrote:
> > Recently I have started getting these confirmations for every email I
> send
> > to the mailing lists. I think it's related to the fact that I recently
> > switched to using a new email address.
> >
> > How can I turn these notifications off?
> >
> > -- Forwarded message --
> > From: 
> > Date: Sat, Feb 9, 2013 at 10:13 AM
> > Subject: Successful post to pgsql-hackers
> > To: Gurjeet Singh 
> >
> >
> > Your message to the pgsql-hackers list, posted on
> >   Sat, 9 Feb 2013 10:11:05 -0500
> >
> > with subject
> >   Re: pg_prewarm
> >
> > is currently being delivered.
> >
> >
> >
> > --
> > Gurjeet Singh
> >
> > http://gurjeet.singh.im/
>
>
>
> --
>  Magnus Hagander
>  Me: http://www.hagander.net/
>  Work: http://www.redpill-linpro.com/
>



-- 
Gurjeet Singh

http://gurjeet.singh.im/

[HACKERS] Fwd: Successful post to pgsql-hackers

2013-02-09 Thread Gurjeet Singh

Recently I have started getting these confirmations for every email I send
to the mailing lists. I think it's related to the fact that I recently
switched to using a new email address.

How can I turn these notifications off?

-- Forwarded message --
From: 
Date: Sat, Feb 9, 2013 at 10:13 AM
Subject: Successful post to pgsql-hackers
To: Gurjeet Singh 

Your message to the pgsql-hackers list, posted on
  Sat, 9 Feb 2013 10:11:05 -0500

with subject
  Re: pg_prewarm

is currently being delivered.

-- 
Gurjeet Singh

http://gurjeet.singh.im/

Re: [HACKERS] pg_prewarm

2013-02-09 Thread Gurjeet Singh

On Sun, Jun 24, 2012 at 1:36 PM, Cédric Villemain wrote:

> Le samedi 23 juin 2012 02:47:15, Josh Berkus a écrit :
> > > The biggest problem with pgfincore from my point of view is that it
> > > only works under Linux, whereas I use a MacOS X machine for my
> > > development, and there is also Windows to think about.  Even if that
> > > were fixed, though, I feel we ought to have something in the core
> > > distribution.  This patch got more +1s than 95% of what gets proposed
> > > on hackers.
> >
> > Fincore is only a blocker to this patch if we think pgfincore is ready
> > to be proposed for the core distribution.  Do we?
>
> I'll make it ready for. (not a huge task).
>

Hi Cedric,

Can you please post the progress on this, if any.

I am planning on polishing up pg_prewarm based on the reviews. As
others have said, I don't see a reason why both can't coexist, maybe in
pgxn. I am all ears if you think otherwise.

Best regards,
-- 
Gurjeet Singh

http://gurjeet.singh.im/

1 2 3 4 5 >

1 - 100 of 427 matches

Mail list logo