date:20120219

Re: [HACKERS] Initial 9.2 pgbench write results

2012-02-19 Thread Simon Riggs

On Sun, Feb 19, 2012 at 4:17 AM, Robert Haas  wrote:

> Here's what's bugging me.  Greg seemed to be assuming that the
> business of the background writer might be the cause of the
> performance drop-off he measured on certain test cases.  But you and I
> both seem to feel that the business of the background writer is
> intentional and desirable.  Supposing we're right, where's the
> drop-off coming from?  *scratches head*

Any source of logical I/O becomes physical I/O when we run short of
memory. So if we're using more memory for any reason that will cause
more swapping. Or if we are doing things like consulting the vmap that
would also cause a problem.

I notice the issue is not as bad for 9.2 in the scale 4000 case, so it
seems more likely that we're just hitting the tipping point earlier on
9.2 and that scale 1000 is right in the middle of the tipping point.

What it does show quite clearly is that the extreme high end response
time variability is still there. It also shows that insufficient
performance testing has been done on this release so far. We may have
"solved" some scalability problems but we've completely ignored real
world performance issues and as Greg says, we now get to pray the
price for not having done that earlier.

I've argued previously that we should have a performance tuning phase
at the end of the release cycle, now it looks that has become a
necessity. Which will turn out to be a good thing in the end, I'm
sure, even if its a little worrying right now.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Initial 9.2 pgbench write results

2012-02-19 Thread Simon Riggs

On Tue, Feb 14, 2012 at 6:45 PM, Greg Smith  wrote:

> Minimal changes were made to the postgresql.conf.  shared_buffers=2GB,
> checkpoint_segments=64, and I left wal_buffers at its default so that 9.1
> got credit for that going up.  See
> http://highperfpostgres.com/results-write-9.2-cf4/541/pg_settings.txt for a
> full list of changes, drive mount options, and important kernel settings.
>  Much of that data wasn't collected in last year's pgbench-tools runs.

Please retest with wal_buffers 128MB, checkpoint_segments 1024

Best to remove any tunable resource bottlenecks before we attempt
further analysis.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] pg_upgrade --logfile option documentation

2012-02-19 Thread Peter Eisentraut

The documentation of the pg_upgrade -l/--logfile option never made much
sense to me:

  -l, --logfile=FILENAMElog session activity to file

I don't know what "session" means for pg_upgrade, so I never used it.

What it actually does is log the output of all the programs that
pg_upgrade calls internally, such as pg_ctl, psql, vacuumdb,
pg_resetxlog, to the specified file, which is quite useful for analyzing
errors such as

unable to connect to new postmaster started with the command: 
"/usr/lib/postgresql/9.1/bin/pg_ctl" -w -l "/dev/null" -D 
"/var/lib/postgresql/9.1/main" -o "-p 5433 -b" start >> "/dev/null" 2>&1

where -l would have put something in the place of /dev/null.

So what might be a better wording for this option?  Something like "log
output of internally called programs to file"?



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_restore ignores PGDATABASE

2012-02-19 Thread Robert Haas

On Sun, Feb 19, 2012 at 1:18 AM, Erik Rijkers  wrote:
> On Sun, February 19, 2012 06:27, Robert Haas wrote:
>> On Sat, Feb 18, 2012 at 11:58 AM, Erik Rijkers  wrote:
>>> pg_restore ignores environment variable PGDATABASE.
>>
>> What exactly do you mean by "ignores"?  pg_restore prints results to
>> standard output unless a database name is specified.  AFAIK, there's
>> no syntax to say "I want a direct-to-database restore to whatever you
>> think the default database is".
>
> That's right, and that seems contradictory with:
>
> "This utility [pg_restore], like most other PostgreSQL utilities, also uses 
> the environment
> variables supported by libpq (see Section 31.13)."
>
> as pg_restore does 'ignore' (for want of a better word) PGDATABASE.
>
> But I think I can conclude from your reply that that behaviour is indeed 
> intentional.

It is, because we want there to be a way of converting a custom or tar
format archive back to text.  I think that probably works out for the
best anyway, since pg_restore is a sufficiently dangerous operation
that you want to be darn sure you're not doing it on the wrong
database.  dropdb also requires a database name, while createdb does
not, for similar reasons...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] wal_buffers

2012-02-19 Thread Euler Taveira de Oliveira

On 19-02-2012 02:24, Robert Haas wrote:
> I have attached tps scatterplots.  The obvious conclusion appears to
> be that, with only 16MB of wal_buffers, the buffer "wraps around" with
> some regularity
>
Isn't it useful to print some messages on the log when we have "wrap around"?
In this case, we have an idea that wal_buffers needs to be increased.


-- 
   Euler Taveira de Oliveira - Timbira   http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_restore ignores PGDATABASE

2012-02-19 Thread Andrew Dunstan




On 02/19/2012 08:02 AM, Robert Haas wrote:

On Sun, Feb 19, 2012 at 1:18 AM, Erik Rijkers  wrote:

On Sun, February 19, 2012 06:27, Robert Haas wrote:

On Sat, Feb 18, 2012 at 11:58 AM, Erik Rijkers  wrote:

pg_restore ignores environment variable PGDATABASE.

What exactly do you mean by "ignores"?  pg_restore prints results to
standard output unless a database name is specified.  AFAIK, there's
no syntax to say "I want a direct-to-database restore to whatever you
think the default database is".

That's right, and that seems contradictory with:

"This utility [pg_restore], like most other PostgreSQL utilities, also uses the 
environment
variables supported by libpq (see Section 31.13)."

as pg_restore does 'ignore' (for want of a better word) PGDATABASE.

But I think I can conclude from your reply that that behaviour is indeed 
intentional.

It is, because we want there to be a way of converting a custom or tar
format archive back to text.  I think that probably works out for the
best anyway, since pg_restore is a sufficiently dangerous operation
that you want to be darn sure you're not doing it on the wrong
database.  dropdb also requires a database name, while createdb does
not, for similar reasons...


Right, I think we probably need to adjust the docs slightly to match 
this reality.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 16-bit page checksums for 9.2

2012-02-19 Thread Simon Riggs

On Thu, Feb 16, 2012 at 1:53 PM, Robert Haas  wrote:

> +                       /*
> +                        * If we're in recovery we cannot dirty a page 
> because of a hint.
> +                        * We can set the hint, just not dirty the page as a 
> result so
> +                        * the hint is lost when we evict the page or 
> shutdown.
> +                        *
> +                        * See long discussion in bufpage.c
> +                        */
> +                       if (RecoveryInProgress())
> +                               return;
>
> Doesn't this seem awfully bad for performance on Hot Standby servers?
> I agree that it fixes the problem with un-WAL-logged pages there, but
> I seem to recall some recent complaining about performance features
> that work on the master but not the standby.  Durable hint bits are
> one such feature.

It's impossible for it to work, in this case, since we cannot write
new WAL to prevent torn pages.

Note that hint bit setting on a dirty block is allowed, so many hints
will still be set in Hot Standby.

> +                        * Basically, we simply prevent the checkpoint WAL 
> record from
> +                        * being written until we have marked the buffer 
> dirty. We don't
> +                        * start the checkpoint flush until we have marked 
> dirty, so our
> +                        * checkpoint must flush the change to disk 
> successfully or the
> +                        * checkpoint never gets written, so crash recovery 
> will fix.
> +                        *
> +                        * It's possible we may enter here without an xid, so 
> it is
> +                        * essential that CreateCheckpoint waits for virtual 
> transactions
> +                        * rather than full transactionids.
> +                        */
> +                       MyPgXact->delayChkpt = delayChkpt = true;
>
> I am slightly worried that this expansion in the use of this mechanism
> (formerly called inCommit, for those following along at home) could
> lead to checkpoint starvation.  Suppose we've got one or two large
> table scans wandering along, setting hint bits, and now suddenly it's
> time to checkpoint.  How long will it take the checkpoint process to
> find a time when nobody's got delayChkpt set?

We don't need to wait until nobody has it set, we just need to wait
for the people that had it set when we first checked to be out of that
state momentarily.

> + #define PageSetChecksum(page) \
> + do \
> + { \
> +       PageHeader      p = (PageHeader) page; \
> +       p->pd_flags |= PD_PAGE_VERSION_PLUS1; \
> +       p->pd_flags |= PD_CHECKSUM1; \
> +       p->pd_flags &= ~PD_CHECKSUM2; \
> +       p->pd_verify.pd_checksum16 = PageCalcChecksum16(page); \
> + } while (0);
> +
> + /* ensure any older checksum info is overwritten with watermark */
> + #define PageResetVersion(page) \
> + do \
> + { \
> +       PageHeader      p = (PageHeader) page; \
> +       if (!PageHasNoChecksum(p)) \
> +       { \
> +               p->pd_flags &= ~PD_PAGE_VERSION_PLUS1; \
> +               p->pd_flags &= ~PD_CHECKSUM1; \
> +               p->pd_flags &= ~PD_CHECKSUM2; \
> +               PageSetPageSizeAndVersion(p, BLCKSZ, PG_PAGE_LAYOUT_VERSION); 
> \
> +       } \
> + } while (0);
>
> So, when the page has a checksum, PD_CHECKSUM2 is not set, and when it
> doesn't have a checksum, PD_CHECKSUM2 is not set?  What good does that
> do?

As explained in detailed comments, the purpose of this is to implement
Heikki's suggestion that we have a bit set to zero so we can detect
failures that cause a run of 1s.

>   * PageGetPageSize
>   *            Returns the page size of a page.
>   *
> !  * Since PageSizeIsValid() when pagesize == BLCKSZ, just written BLCKSZ.
> !  * This can be called on any page, initialised or not, in or out of buffers.
> !  * You might think this can vary at runtime but you'd be wrong, since pages
> !  * frequently need to occupy buffers and pages are copied from one to 
> another
> !  * so there are many hidden assumptions that this simple definition is true.
>   */
> ! #define PageGetPageSize(page) (BLCKSZ)
>
> I think I agree that the current definition of PageGetPageSize seems
> unlikely to come anywhere close to allowing us to cope with multiple
> page sizes, but I think this method of disabling it is a hack.  The
> callers that want to know how big the page really is should just use
> BLCKSZ instead of this macro, and those that want to know how big the
> page THINKS it is (notably contrib/pageinspect) need a way to get that
> information.

Fair comment.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] wal_buffers

2012-02-19 Thread Robert Haas

On Sun, Feb 19, 2012 at 9:46 AM, Euler Taveira de Oliveira
 wrote:
> On 19-02-2012 02:24, Robert Haas wrote:
>> I have attached tps scatterplots.  The obvious conclusion appears to
>> be that, with only 16MB of wal_buffers, the buffer "wraps around" with
>> some regularity
>>
> Isn't it useful to print some messages on the log when we have "wrap around"?
> In this case, we have an idea that wal_buffers needs to be increased.

I was thinking about that.  I think that what might be more useful
than a log message is a counter somewhere in shared memory.  Logging
imposes a lot of overhead, which is exactly what we don't want here,
and the volume might be quite high on a system that is bumping up
against this problem.  Of course then the question is... how would we
expose the counter value?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade --logfile option documentation

2012-02-19 Thread Robert Haas

On Sun, Feb 19, 2012 at 6:13 AM, Peter Eisentraut  wrote:
> The documentation of the pg_upgrade -l/--logfile option never made much
> sense to me:
>
>  -l, --logfile=FILENAME        log session activity to file
>
> I don't know what "session" means for pg_upgrade, so I never used it.
>
> What it actually does is log the output of all the programs that
> pg_upgrade calls internally, such as pg_ctl, psql, vacuumdb,
> pg_resetxlog, to the specified file, which is quite useful for analyzing
> errors such as
>
> unable to connect to new postmaster started with the command: 
> "/usr/lib/postgresql/9.1/bin/pg_ctl" -w -l "/dev/null" -D 
> "/var/lib/postgresql/9.1/main" -o "-p 5433 -b" start >> "/dev/null" 2>&1
>
> where -l would have put something in the place of /dev/null.
>
> So what might be a better wording for this option?  Something like "log
> output of internally called programs to file"?

I don't think we should be that specific, because we might someday
want pg_upgrade itself to write messages to that file as well, even if
it doesn't today.  I agree that the phrase "session activity" is a bit
misleading.

As a more general comment, I think that the way pg_upgrade does
logging right now is absolutely terrible.  IME, it is utterly
impossible to understand what has gone wrong with pg_upgrade without
looking at the log file.  And by default, no log file is created.  So
typically what happens is:

- I run pg_upgrade.  It fails.
- I rename the control file from the old cluster back to its original name.
- I rerun pg_upgrade, this time with -l.  It fails again.
- I read the log file, figure out what the problem is, and correct it.
- I rename the control file from the old cluster back to its original
name, again.
- I run pg_upgrade a third time.
- On a good day, it works, else go to step 5.

One pretty obvious improvement would be: if pg_upgrade fails after
renaming the control file for the old cluster out of the way - say,
while loading the schema dump into the new cluster - have it RENAME
THE OLD CONTROL FILE BACK before exiting.  But I also think the
logging needs improvement.  Right now, we studiously redirect both
stdout and stderr to /dev/null; maybe it would be better to redirect
stdout to /dev/null and NOT redirect stderr.  If that generates too
much chatter in non-failure cases, then let's adjust the output of the
commands pg_upgrade is invoking until it doesn't.  The actual cause of
the failure, rather than pg_upgrade's fairly-useless gloss on it,
ought to be visible right away, at least IMHO.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] wal_buffers

2012-02-19 Thread Tom Lane

Robert Haas  writes:
> On Sun, Feb 19, 2012 at 9:46 AM, Euler Taveira de Oliveira
>  wrote:
>> Isn't it useful to print some messages on the log when we have "wrap around"?
>> In this case, we have an idea that wal_buffers needs to be increased.

> I was thinking about that.  I think that what might be more useful
> than a log message is a counter somewhere in shared memory.  Logging
> imposes a lot of overhead, which is exactly what we don't want here,
> and the volume might be quite high on a system that is bumping up
> against this problem.  Of course then the question is... how would we
> expose the counter value?

Why do you need a counter, other than the current LSN?  Surely the
number of WAL buffer ring cycles can be deduced directly from that.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] wal_buffers

2012-02-19 Thread Simon Riggs

On Sun, Feb 19, 2012 at 6:33 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Sun, Feb 19, 2012 at 9:46 AM, Euler Taveira de Oliveira
>>  wrote:
>>> Isn't it useful to print some messages on the log when we have "wrap 
>>> around"?
>>> In this case, we have an idea that wal_buffers needs to be increased.
>
>> I was thinking about that.  I think that what might be more useful
>> than a log message is a counter somewhere in shared memory.  Logging
>> imposes a lot of overhead, which is exactly what we don't want here,
>> and the volume might be quite high on a system that is bumping up
>> against this problem.  Of course then the question is... how would we
>> expose the counter value?
>
> Why do you need a counter, other than the current LSN?  Surely the
> number of WAL buffer ring cycles can be deduced directly from that.

The problem isn't how many times its cycled, the issue is whether
there was a wait induced by needing to flush wal buffers because of
too many writes. You can't count those waits in the way you suggest,
though you can calculate an upper limit on them, but that's not very
useful.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Reducing bgwriter wakeups

2012-02-19 Thread Simon Riggs

Recent changes for power reduction mean that we now issue a wakeup
call to the bgwriter every time we set a hint bit.

However cheap that is, its still overkill.

My proposal is that we wakeup the bgwriter whenever a backend is
forced to write a dirty buffer, a job the bgwriter should have been
doing.

This significantly reduces the number of wakeup calls and allows the
bgwriter to stay asleep even when very light traffic happens, which is
good because the bgwriter is often the last process to sleep.

Seems useful to have an explicit discussion on this point, especially
in view of recent performance results.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 1adb6d3..310cd95 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -654,6 +654,10 @@ BufferAlloc(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
 FlushBuffer(buf, NULL);
 LWLockRelease(buf->content_lock);
 
+/* The bgwriter may need to be woken. */
+if (ProcGlobal->bgwriterLatch)
+	SetLatch(ProcGlobal->bgwriterLatch);
+
 TRACE_POSTGRESQL_BUFFER_WRITE_DIRTY_DONE(forkNum, blockNum,
 			   smgr->smgr_rnode.node.spcNode,
 smgr->smgr_rnode.node.dbNode,
@@ -2368,9 +2372,6 @@ SetBufferCommitInfoNeedsSave(Buffer buffer)
 			VacuumPageDirty++;
 			if (VacuumCostActive)
 VacuumCostBalance += VacuumCostPageDirty;
-			/* The bgwriter may need to be woken. */
-			if (ProcGlobal->bgwriterLatch)
-SetLatch(ProcGlobal->bgwriterLatch);
 		}
 	}
 }

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] patch: autocomplete for functions

2012-02-19 Thread Pavel Stehule

Hello

I found so this extremely simple patch should be useful.

It helps for pattern SELECT fx();

There was thread about it.

 Regards

Pavel
*** ./src/bin/psql/tab-complete.c.orig	2012-02-05 11:28:48.0 +0100
--- ./src/bin/psql/tab-complete.c	2012-02-19 20:05:05.241626625 +0100
***
*** 2555,2562 
  		COMPLETE_WITH_CONST("IS");
  
  /* SELECT */
! 	/* naah . . . */
! 
  /* SET, RESET, SHOW */
  	/* Complete with a variable name */
  	else if ((pg_strcasecmp(prev_wd, "SET") == 0 &&
--- 2555,2562 
  		COMPLETE_WITH_CONST("IS");
  
  /* SELECT */
! 	else if (pg_strcasecmp(prev_wd, "SELECT") == 0)
! 		COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_functions, NULL);
  /* SET, RESET, SHOW */
  	/* Complete with a variable name */
  	else if ((pg_strcasecmp(prev_wd, "SET") == 0 &&

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 16-bit page checksums for 9.2

2012-02-19 Thread Simon Riggs

On Sun, Feb 19, 2012 at 4:35 PM, Simon Riggs  wrote:

> We don't need to wait until nobody has it set, we just need to wait
> for the people that had it set when we first checked to be out of that
> state momentarily.

I've just finished doing some performance analysis on various aspects
of hint bit setting for this patch.

I've seen as high as 14% of transactions writing full pages during a
pgbench run. That sounds quite bad, but on pgbench at least all of
those are associated with UPDATEs, which would dirty the page anyway,
so there aren't any more full page writes overall.

Checkpoints would be delayed only until a virtual transaction ends or
a virtual transaction comes out of DelayCkpt state. If a virtual
transaction was long running it wouldn't spend much time in the
delaying state, especially if we take into account I/O requirements.

So although I'm not exactly happy with the overheads, they don't seem
to be as big a problem as they sound. Plus this is all optional and
avoidable.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] patch: CREATE OR REPLACE FUNCTION autocomplete

2012-02-19 Thread Pavel Stehule

Hello

other very simple patch - enhance autocomplete to support CREATE OR
REPLACE FUNCTION statement

Regards

Pavel Stehule
*** ./src/bin/psql/tab-complete.c.orig	2012-02-19 20:05:05.0 +0100
--- ./src/bin/psql/tab-complete.c	2012-02-19 20:20:43.817202512 +0100
***
*** 644,649 
--- 644,650 
  	{"INDEX", NULL, &Query_for_list_of_indexes},
  	{"OPERATOR", NULL, NULL},	/* Querying for this is probably not such a
   * good idea. */
+ 	{"OR REPLACE FUNCTION", NULL, &Query_for_list_of_functions},
  	{"OWNED", NULL, NULL, THING_NO_CREATE},		/* for DROP OWNED BY ... */
  	{"PARSER", Query_for_list_of_ts_parsers, NULL, THING_NO_SHOW},
  	{"ROLE", Query_for_list_of_roles},

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reducing bgwriter wakeups

2012-02-19 Thread Robert Haas

On Sun, Feb 19, 2012 at 1:53 PM, Simon Riggs  wrote:
> Recent changes for power reduction mean that we now issue a wakeup
> call to the bgwriter every time we set a hint bit.
>
> However cheap that is, its still overkill.
>
> My proposal is that we wakeup the bgwriter whenever a backend is
> forced to write a dirty buffer, a job the bgwriter should have been
> doing.
>
> This significantly reduces the number of wakeup calls and allows the
> bgwriter to stay asleep even when very light traffic happens, which is
> good because the bgwriter is often the last process to sleep.
>
> Seems useful to have an explicit discussion on this point, especially
> in view of recent performance results.

I don't see what this has to do with recent performance results, so
please elaborate.  Off-hand, I don't see any point in getting cheap.
It seems far more important to me that the background writer become
active when needed than that we save some trivial amount of power by
waiting longer before activating it.  If we're concerned about saving
power, then IMHO what we should be worried about is that the wal
writer is still waking up 5x/s.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reducing bgwriter wakeups

2012-02-19 Thread Simon Riggs

On Sun, Feb 19, 2012 at 8:15 PM, Robert Haas  wrote:
> On Sun, Feb 19, 2012 at 1:53 PM, Simon Riggs  wrote:
>> Recent changes for power reduction mean that we now issue a wakeup
>> call to the bgwriter every time we set a hint bit.
>>
>> However cheap that is, its still overkill.
>>
>> My proposal is that we wakeup the bgwriter whenever a backend is
>> forced to write a dirty buffer, a job the bgwriter should have been
>> doing.
>>
>> This significantly reduces the number of wakeup calls and allows the
>> bgwriter to stay asleep even when very light traffic happens, which is
>> good because the bgwriter is often the last process to sleep.
>>
>> Seems useful to have an explicit discussion on this point, especially
>> in view of recent performance results.
>
> I don't see what this has to do with recent performance results, so
> please elaborate.  Off-hand, I don't see any point in getting cheap.
> It seems far more important to me that the background writer become
> active when needed than that we save some trivial amount of power by
> waiting longer before activating it.

Then you misunderstand, since I am advocating waking it when needed.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Improve pretty printing of viewdefs.

2012-02-19 Thread Andrew Dunstan


On 02/19/2012 04:18 PM, Andrew Dunstan wrote:


[redirecting to -hackers]


Arghh, this time redirecting ...


On 02/19/2012 12:04 PM, Pavel Stehule wrote:

Hello

nice

should be this functionality used for query too?

some like

pg_pretty_query('SELECT ... ', 80)

when we have this functionality.




It would probably be possible to leverage some of this for that, but 
it's certainly not part of the present piece of work. All the logic is 
there in get_query_def() and friends. There would need to be a wrapper 
that called the parser to get a query object from the input string and 
then called get_query_def() to get back the reformatted output, and 
there are probably any number of wrinkles I haven't thought of.


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 16-bit page checksums for 9.2

2012-02-19 Thread Robert Haas

On Sun, Feb 19, 2012 at 11:35 AM, Simon Riggs  wrote:
>> Doesn't this seem awfully bad for performance on Hot Standby servers?
>> I agree that it fixes the problem with un-WAL-logged pages there, but
>> I seem to recall some recent complaining about performance features
>> that work on the master but not the standby.  Durable hint bits are
>> one such feature.
>
> It's impossible for it to work, in this case, since we cannot write
> new WAL to prevent torn pages.
>
> Note that hint bit setting on a dirty block is allowed, so many hints
> will still be set in Hot Standby.

To me, it seems that you are applying a double standard.  You have
twice attempted to insist that I do extra work to make major features
that I worked on - unlogged tables and index-only scans - work in Hot
Standby mode, despite the existence of significant technological
obstacles.  But when it comes to your own feature, you simply state
that it cannot be done, and therefore we need not do it.   Of course,
this feature, like those, CAN be made to work.  It just involves
solving difficult problems that have little to do with the primary
purpose of the patch.  To be honest, I don't use Hot Standby enough to
care very much about this particular issue, and I'm not saying we
should reject it on these grounds.  But I do think it's a mistake to
dismiss it entirely, since every limitation of Hot Standby narrows the
set of cases in which it can be deployed.  And at any rate, I want the
same standard applied to my work as to yours.

>> I am slightly worried that this expansion in the use of this mechanism
>> (formerly called inCommit, for those following along at home) could
>> lead to checkpoint starvation.  Suppose we've got one or two large
>> table scans wandering along, setting hint bits, and now suddenly it's
>> time to checkpoint.  How long will it take the checkpoint process to
>> find a time when nobody's got delayChkpt set?
>
> We don't need to wait until nobody has it set, we just need to wait
> for the people that had it set when we first checked to be out of that
> state momentarily.

Ah... good point.

>> So, when the page has a checksum, PD_CHECKSUM2 is not set, and when it
>> doesn't have a checksum, PD_CHECKSUM2 is not set?  What good does that
>> do?
>
> As explained in detailed comments, the purpose of this is to implement
> Heikki's suggestion that we have a bit set to zero so we can detect
> failures that cause a run of 1s.

I think it's nonsensical to pretend that there's anything special
about that particular bit.  If we want to validate the page header
before trusting the lack of a checksum bit, we can do that far more
thoroughly than just checking that one bit.  There are a whole bunch
of bits that ought to always be zero, and there are other things we
can validate as well (e.g. LSN not in future).  If we're concerned
about the checksum-enabled bit getting flipped (and I agree that we
should be), we can check some combination of that stuff in the hope of
catching it, and that'll be a far better guard than just checking one
arbitrarily selected bit.

That having been said, I don't feel very good about the idea of
relying on the contents of the page to tell us whether or not the page
has a checksum.  There's no guarantee that an error that flips the
has-checksum bit will flip any other bit on the page, or that it won't
flip everything else we're relying on as a backstop in exactly the
manner that foils whatever algorithm we put in place.  Random
corruption is, perhaps, unlikely to do that, but somehow I feel like
it'll happen more often than random chance suggests.  Things never
fail the way you want them to.

Another disadvantage of the current scheme is that there's no
particularly easy way to know that your whole cluster has checksums.
No matter how we implement checksums, you'll have to rewrite every
table in the cluster in order to get them fully turned on.  But with
the current design, there's no easy way to know how much of the
cluster is actually checksummed.  If you shut checksums off, they'll
linger until those pages are rewritten, and there's no easy way to
find the relations from which they need to be removed, either.

I'm tempted to suggest a relation-level switch: when you want
checksums, you use ALTER TABLE to turn them on, and when you don't
want them any more you use ALTER TABLE to shut them off again, in each
case rewriting the table.  That way, there's never any ambiguity about
what's in the data pages in a given relation: either they're either
all checksummed, or none of them are.  This moves the decision about
whether checksums are enabled or disabled quite a but further away
from the data itself, and also allows you to determine (by catalog
inspection) which parts of the cluster do in fact have checksums.  It
might be kind of a pain to implement, though: you'd have to pass the
information about how any given relation was configured down to the
place where we validate page sanity.  I'm not sure whether th

Re: [HACKERS] Reducing bgwriter wakeups

2012-02-19 Thread Robert Haas

On Sun, Feb 19, 2012 at 4:11 PM, Simon Riggs  wrote:
> On Sun, Feb 19, 2012 at 8:15 PM, Robert Haas  wrote:
>> On Sun, Feb 19, 2012 at 1:53 PM, Simon Riggs  wrote:
>>> Recent changes for power reduction mean that we now issue a wakeup
>>> call to the bgwriter every time we set a hint bit.
>>>
>>> However cheap that is, its still overkill.
>>>
>>> My proposal is that we wakeup the bgwriter whenever a backend is
>>> forced to write a dirty buffer, a job the bgwriter should have been
>>> doing.
>>>
>>> This significantly reduces the number of wakeup calls and allows the
>>> bgwriter to stay asleep even when very light traffic happens, which is
>>> good because the bgwriter is often the last process to sleep.
>>>
>>> Seems useful to have an explicit discussion on this point, especially
>>> in view of recent performance results.
>>
>> I don't see what this has to do with recent performance results, so
>> please elaborate.  Off-hand, I don't see any point in getting cheap.
>> It seems far more important to me that the background writer become
>> active when needed than that we save some trivial amount of power by
>> waiting longer before activating it.
>
> Then you misunderstand, since I am advocating waking it when needed.

Well, I guess that depends on when it's actually needed.  You haven't
presented any evidence one way or the other.

I mean, let's suppose that a sudden spike of activity hits a
previously-idle system.  If we wait until all of shared_buffers is
dirty before waking up the background writer, it seems possible that
the background writer is going to have a hard time catching up.  If we
wake it immediately, we don't have that problem.

Also, in general, I think that it's not a good idea to let dirty data
sit in shared_buffers forever.  I'm unhappy about the change this
release cycle to skip checkpoints if we've written less than a full
WAL segment, and this seems like another step in that direction.  It's
exposing us to needless risk of data loss.  In 9.1, if you process a
transaction and, an hour later, the disk where pg_xlog is written
melts into a heap of molten slag, your transaction will be there, even
if you end up having to run pg_resetxlog.  In 9.2, it may well be that
xlog contains the only record of that transaction, and you're hosed.
The more work we do to postpone writing the data until the absolutely
last possible moment, the more likely it is that it won't be on disk
when we need it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] leakproof

2012-02-19 Thread Andrew Dunstan

I missed all the fun while the "leakproof" addition to function 
attributes was being decided, so I know I'm late to the party. Today I 
had to go and look up what it actually meant. I have to say that I was a 
bit surprised. I expected it to refer to memory management in some way. 
I don't honestly think "leakproof" as a term is going to convey much to 
lots of people. Can we come up with a more descriptive term?


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Future of our regular expression code

2012-02-19 Thread Brendan Jurd

On 19 February 2012 15:49, Tom Lane  wrote:
> That sounds great.
>
> BTW, if you don't have it already, I'd highly recommend getting a copy
> of Friedl's "Mastering Regular Expressions".  It's aimed at users not
> implementers, but there is a wealth of valuable context information in
> there, as well as a really good not-too-technical overview of typical
> implementation techniques for RE engines.  You'd probably still want one
> of the more academic presentations such as the dragon book for
> reference, but I think Freidl's take on it is extremely useful.

Thanks for the recommendations Tom.  I've now got Friedl, and there's
a dead-tree copy of 'Compilers' making its gradual way to me (no
ebook).
I've also been reading the article series by Russ Cox linked upthread
-- it's good stuff.

Are you far enough into the backrefs bug that you'd prefer to see it
through, or would you like me to pick it up?

Cheers,
BJ

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reducing bgwriter wakeups

2012-02-19 Thread Jeff Janes

On Sun, Feb 19, 2012 at 2:18 PM, Robert Haas  wrote:
>
> Also, in general, I think that it's not a good idea to let dirty data
> sit in shared_buffers forever.  I'm unhappy about the change this
> release cycle to skip checkpoints if we've written less than a full
> WAL segment, and this seems like another step in that direction.  It's
> exposing us to needless risk of data loss.  In 9.1, if you process a
> transaction and, an hour later, the disk where pg_xlog is written
> melts into a heap of molten slag, your transaction will be there, even
> if you end up having to run pg_resetxlog.

Would the log really have been archived in 9.1?  I don't think
checkpoint_timeout caused a log switch, just a checkpoint which could
happily be in the same file as the previous checkpoint.

> In 9.2, it may well be that
> xlog contains the only record of that transaction, and you're hosed.
> The more work we do to postpone writing the data until the absolutely
> last possible moment, the more likely it is that it won't be on disk
> when we need it.

Isn't that what archive_timeut is for?

Should archive_timeout default to something like 5 min, rather than 0?

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 16-bit page checksums for 9.2

2012-02-19 Thread Simon Riggs

On Sun, Feb 19, 2012 at 10:04 PM, Robert Haas  wrote:

> To me, it seems that you are applying a double standard.  You have
> twice attempted to insist that I do extra work to make major features
> that I worked on - unlogged tables and index-only scans - work in Hot
> Standby mode, despite the existence of significant technological
> obstacles.  But when it comes to your own feature, you simply state
> that it cannot be done, and therefore we need not do it.   Of course,
> this feature, like those, CAN be made to work.

Vitriol aside, If you would be so kind as to explain how it is
possible, as you claim, I'll look into making it work.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Future of our regular expression code

2012-02-19 Thread Tom Lane

Brendan Jurd  writes:
> Are you far enough into the backrefs bug that you'd prefer to see it
> through, or would you like me to pick it up?

Actually, what I've been doing today is a brain dump.  This code is
never going to be maintainable by anybody except its original author
without some internals documentation, so I've been trying to write
some based on what I've managed to reverse-engineer so far.  It's
not very complete, but I do have some words about the DFA/NFA stuff,
which I will probably revise and fill in some more as I work on the
backref fix, because that's where that bug lives.  I have also got
a bunch of text about the colormap management code, which I think
is interesting right now because that is what we are going to have
to fix if we want decent performance for Unicode \w and related
classes (cf the other current -hackers thread about regexes).
I was hoping to prevail on you to pick that part up as your first
project.  I will commit what I've got in a few minutes --- look
for src/backend/regex/README in that commit.  I encourage you to
add to that file as you figure stuff out.  We could stand to upgrade
a lot of the code comments too, of course, but I think a narrative
description is pretty useful before diving into code.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 16-bit page checksums for 9.2

2012-02-19 Thread Simon Riggs

On Sun, Feb 19, 2012 at 10:04 PM, Robert Haas  wrote:

>> As explained in detailed comments, the purpose of this is to implement
>> Heikki's suggestion that we have a bit set to zero so we can detect
>> failures that cause a run of 1s.
>
> I think it's nonsensical to pretend that there's anything special
> about that particular bit.  If we want to validate the page header
> before trusting the lack of a checksum bit, we can do that far more
> thoroughly than just checking that one bit.  There are a whole bunch
> of bits that ought to always be zero, and there are other things we
> can validate as well (e.g. LSN not in future).  If we're concerned
> about the checksum-enabled bit getting flipped (and I agree that we
> should be), we can check some combination of that stuff in the hope of
> catching it, and that'll be a far better guard than just checking one
> arbitrarily selected bit.

I thought it was a reasonable and practical idea from Heikki. The bit
is not selected arbitrarily, it is by design adjacent to one of the
other bits. So overall, 3 bits need to be set to a precise value and a
run of 1s or 0s will throw and error.

> That having been said, I don't feel very good about the idea of
> relying on the contents of the page to tell us whether or not the page
> has a checksum.  There's no guarantee that an error that flips the
> has-checksum bit will flip any other bit on the page, or that it won't
> flip everything else we're relying on as a backstop in exactly the
> manner that foils whatever algorithm we put in place.  Random
> corruption is, perhaps, unlikely to do that, but somehow I feel like
> it'll happen more often than random chance suggests.  Things never
> fail the way you want them to.

You're right. This patch is not the best possible world, given a clean
slate. But we don't have a clean slate.

The fact is this patch will detect corruptions pretty well and that's
what Postgres users want.

While developing this, many obstacles could have been blockers. I
think we're fairly lucky that I managed to find a way through the
minefield of obstacles.

> Another disadvantage of the current scheme is that there's no
> particularly easy way to know that your whole cluster has checksums.
> No matter how we implement checksums, you'll have to rewrite every
> table in the cluster in order to get them fully turned on.  But with
> the current design, there's no easy way to know how much of the
> cluster is actually checksummed.

You can read every block and check.

> If you shut checksums off, they'll
> linger until those pages are rewritten, and there's no easy way to
> find the relations from which they need to be removed, either.

We can't have it both ways. Either we have an easy upgrade, or we
don't. I'm told that an easy upgrade is essential.

> I'm tempted to suggest a relation-level switch: when you want
> checksums, you use ALTER TABLE to turn them on, and when you don't
> want them any more you use ALTER TABLE to shut them off again, in each
> case rewriting the table.  That way, there's never any ambiguity about
> what's in the data pages in a given relation: either they're either
> all checksummed, or none of them are.  This moves the decision about
> whether checksums are enabled or disabled quite a but further away
> from the data itself, and also allows you to determine (by catalog
> inspection) which parts of the cluster do in fact have checksums.  It
> might be kind of a pain to implement, though: you'd have to pass the
> information about how any given relation was configured down to the
> place where we validate page sanity.  I'm not sure whether that's
> practical.

It's not practical as the only mechanism, given the requirement for upgrade.

As I mention in the docs, if you want that, use VACUUM FULL. So there
is a mechanism if you want it.

> I also think that the question of where exactly the checksum ought to
> be put might bear some more thought.  Tom expressed some concern about
> stealing the page version field, and it seems to me we could work
> around that by instead stealing something less valuable.  The top
> eight bits of the pd_pagesize_version field probably meet that
> criteria, since in practice basically everybody uses 8K blocks, and
> the number of errors that will be caught by checksums is probably much
> larger than the number of errors that will be caught by the page-size
> cross-check.  But maybe the other 8 bits should come from somewhere
> else, maybe pd_tli or pd_flags.  For almost all practical purposes,
> storing only the low-order 8 bits of the TLI ought to provide just as
> much of a safety check as storing the low-order 16 bits, so I think
> that might be the way to go.  We could set it up so that we always
> check only the low 8 bits against the TLI, regardless of whether
> checksums are enabled; then we basically free up that bit space at no
> cost in code complexity.

The version stuff is catered for by the current patch.

TLI isn't something I want

Re: [HACKERS] Potential reference miscounts and segfaults in plpython.c

2012-02-19 Thread Jan Urbański

On 18/02/12 21:18, Jan Urbański wrote:
> On 18/02/12 21:17, Tom Lane wrote:
>> =?UTF-8?B?SmFuIFVyYmHFhHNraQ==?=  writes:
>>> On 18/02/12 20:30, Tom Lane wrote:
 Dave Malcolm at Red Hat has been working on a static code analysis tool
 for Python-related C code.  He reports here on some preliminary results
 for plpython.c:
 https://bugzilla.redhat.com/show_bug.cgi?id=795011
>>
>> If you find any live bugs, it'd likely be better to deal with them as
>> a separate patch so that we can back-patch ...
> 
> Sure, I meant to say I'll look at these as well, but will make them into
> a separate patch.

Here's a patch that fixes everything I was sure was an actual bug. The
rest of the warnings seem to be caused by the tool not knowing that
elog(ERROR) throws a longjmp and things like "we never unref this
object, so it can't disappear mid-execution".

Attached are patches for HEAD and for 9.1.x (since splitting plpython.c
in 9.2 was kind of my idea I felt bad about you having to back-patch
this so I tried to do the necessary legwork myself; I hope the attached
is what you need).

BTW, that tool is quite handy, I'll have to try running it over psycopg2.

Cheers,
Jan
diff --git a/src/pl/plpython/plpython.c b/src/pl/plpython/plpython.c
index 530b5f0..2b064c5 100644
*** a/src/pl/plpython/plpython.c
--- b/src/pl/plpython/plpython.c
*** PLyList_FromArray(PLyDatumToOb *arg, Dat
*** 2345,2350 
--- 2345,2352 
  	length = ARR_DIMS(array)[0];
  	lbound = ARR_LBOUND(array)[0];
  	list = PyList_New(length);
+ 	if (list == NULL)
+ 		elog(ERROR, "could not transform Python list to array");
  
  	for (i = 0; i < length; i++)
  	{
*** PLy_spi_execute_query(char *query, long
*** 3664,3670 
  	int			rv;
  	volatile MemoryContext oldcontext;
  	volatile ResourceOwner oldowner;
! 	PyObject   *ret;
  
  	oldcontext = CurrentMemoryContext;
  	oldowner = CurrentResourceOwner;
--- 3666,3672 
  	int			rv;
  	volatile MemoryContext oldcontext;
  	volatile ResourceOwner oldowner;
! 	PyObject   *ret = NULL;
  
  	oldcontext = CurrentMemoryContext;
  	oldowner = CurrentResourceOwner;
*** PLy_spi_execute_query(char *query, long
*** 3727,3732 
--- 3729,3735 
  
  	if (rv < 0)
  	{
+ 		Py_XDECREF(ret);
  		PLy_exception_set(PLy_exc_spi_error,
  		  "SPI_execute failed: %s",
  		  SPI_result_code_string(rv));
*** PLy_generate_spi_exceptions(PyObject *mo
*** 3967,3973 
--- 3970,3982 
  		PyObject   *sqlstate;
  		PyObject   *dict = PyDict_New();
  
+ 		if (dict == NULL)
+ 			elog(ERROR, "could not generate SPI exceptions");
+ 
  		sqlstate = PyString_FromString(unpack_sql_state(exception_map[i].sqlstate));
+ 		if (sqlstate == NULL)
+ 			elog(ERROR, "could not generate SPI exceptions");
+ 
  		PyDict_SetItemString(dict, "sqlstate", sqlstate);
  		Py_DECREF(sqlstate);
  		exc = PyErr_NewException(exception_map[i].name, base, dict);
*** PLy_add_exceptions(PyObject *plpy)
*** 4008,4013 
--- 4017,4027 
  	PLy_exc_fatal = PyErr_NewException("plpy.Fatal", NULL, NULL);
  	PLy_exc_spi_error = PyErr_NewException("plpy.SPIError", NULL, NULL);
  
+ 	if (PLy_exc_error == NULL ||
+ 		PLy_exc_fatal == NULL ||
+ 		PLy_exc_spi_error == NULL)
+ 		elog(ERROR, "could not create the base SPI exceptions");
+ 
  	Py_INCREF(PLy_exc_error);
  	PyModule_AddObject(plpy, "Error", PLy_exc_error);
  	Py_INCREF(PLy_exc_fatal);
*** PLy_init_interp(void)
*** 4124,4129 
--- 4138,4145 
  	Py_INCREF(mainmod);
  	PLy_interp_globals = PyModule_GetDict(mainmod);
  	PLy_interp_safe_globals = PyDict_New();
+ 	if (PLy_interp_safe_globals == NULL)
+ 		PLy_elog(ERROR, "could not create globals");
  	PyDict_SetItemString(PLy_interp_globals, "GD", PLy_interp_safe_globals);
  	Py_DECREF(mainmod);
  	if (PLy_interp_globals == NULL || PyErr_Occurred())
*** PLy_init_plpy(void)
*** 4164,4169 
--- 4180,4187 
  	main_mod = PyImport_AddModule("__main__");
  	main_dict = PyModule_GetDict(main_mod);
  	plpy_mod = PyImport_AddModule("plpy");
+ 	if (plpy_mod == NULL)
+ 		elog(ERROR, "could not initialize plpy");
  	PyDict_SetItemString(main_dict, "plpy", plpy_mod);
  	if (PyErr_Occurred())
  		elog(ERROR, "could not initialize plpy");
*** PLy_output(volatile int level, PyObject
*** 4231,4238 
  		 * decoration.
  		 */
  		PyObject   *o;
  
- 		PyArg_UnpackTuple(args, "plpy.elog", 1, 1, &o);
  		so = PyObject_Str(o);
  	}
  	else
--- 4249,4260 
  		 * decoration.
  		 */
  		PyObject   *o;
+ 		int			result;
+ 
+ 		result = PyArg_UnpackTuple(args, "plpy.elog", 1, 1, &o);
+ 		if (!result)
+ 			elog(ERROR, "could not unpack arguments in plpy.elog");
  
  		so = PyObject_Str(o);
  	}
  	else
*** get_source_line(const char *src, int lin
*** 4554,4559 
--- 4576,4585 
  	const char *next = src;
  	int			current = 0;
  
+ 	/* sanity check */
+ 	if (lineno <= 0)
+ 		return NULL;
+ 
  	while (current < lineno)
  	{

Re: [HACKERS] pl/python long-lived allocations in datum->dict transformation

2012-02-19 Thread Jan Urbański

On 14/02/12 01:35, Tom Lane wrote:
> =?UTF-8?B?SmFuIFVyYmHFhHNraQ==?=  writes:
>> It's not very comfortable, but
>> I think PLyDict_FromTuple can be allowed to be non-reentrant.
> 
> I think that's pretty short-sighted.  Even if it's safe today (which
> I am not 100% convinced of), there are plenty of foreseeable reasons
> why it might^Wwill break in the future.
> 
>> OTOH if we want to make it reentrant, some more tinkering would be in order.
> 
> I think that's in order.

Here are the results of the tinkering.

I came up with a stack of context structures that gets pushed when a
PL/Python starts being executed and popped when it returns. At first
they contained just a scratch memory context used by PLyDict_FromTuple.
Then under the premise of confirming the usefulness of introducing such
contexts I removed the global PLy_curr_procedure variable and changed
all users to get the current procedure from the context. It seems to
have worked, so the total count of global variables is unchanged - hooray!

While testing I found one more leak, this time caused by allocating a
structure for caching array type I/O functions and never freeing it.
Attached as separate patch.

Cheers,
Jan
diff --git a/src/pl/plpython/plpy_cursorobject.c b/src/pl/plpython/plpy_cursorobject.c
index 4226dc7..46930b0 100644
*** a/src/pl/plpython/plpy_cursorobject.c
--- b/src/pl/plpython/plpy_cursorobject.c
***
*** 14,19 
--- 14,20 
  #include "plpy_cursorobject.h"
  
  #include "plpy_elog.h"
+ #include "plpy_main.h"
  #include "plpy_planobject.h"
  #include "plpy_procedure.h"
  #include "plpy_resultobject.h"
*** PLy_cursor_query(const char *query)
*** 121,126 
--- 122,128 
  	{
  		SPIPlanPtr	plan;
  		Portal		portal;
+ 		PLyExecutionContext	*exec_ctx = PLy_current_execution_context();
  
  		pg_verifymbstr(query, strlen(query), false);
  
*** PLy_cursor_query(const char *query)
*** 129,136 
  			elog(ERROR, "SPI_prepare failed: %s",
   SPI_result_code_string(SPI_result));
  
  		portal = SPI_cursor_open(NULL, plan, NULL, NULL,
!  PLy_curr_procedure->fn_readonly);
  		SPI_freeplan(plan);
  
  		if (portal == NULL)
--- 131,140 
  			elog(ERROR, "SPI_prepare failed: %s",
   SPI_result_code_string(SPI_result));
  
+ 		Assert(exec_ctx->curr_proc != NULL);
+ 
  		portal = SPI_cursor_open(NULL, plan, NULL, NULL,
!  exec_ctx->curr_proc->fn_readonly);
  		SPI_freeplan(plan);
  
  		if (portal == NULL)
*** PLy_cursor_plan(PyObject *ob, PyObject *
*** 210,215 
--- 214,220 
  		Portal		portal;
  		char	   *volatile nulls;
  		volatile int j;
+ 		PLyExecutionContext *exec_ctx = PLy_current_execution_context();
  
  		if (nargs > 0)
  			nulls = palloc(nargs * sizeof(char));
*** PLy_cursor_plan(PyObject *ob, PyObject *
*** 252,259 
  			}
  		}
  
  		portal = SPI_cursor_open(NULL, plan->plan, plan->values, nulls,
!  PLy_curr_procedure->fn_readonly);
  		if (portal == NULL)
  			elog(ERROR, "SPI_cursor_open() failed: %s",
   SPI_result_code_string(SPI_result));
--- 257,266 
  			}
  		}
  
+ 		Assert(exec_ctx->curr_proc != NULL);
+ 
  		portal = SPI_cursor_open(NULL, plan->plan, plan->values, nulls,
!  exec_ctx->curr_proc->fn_readonly);
  		if (portal == NULL)
  			elog(ERROR, "SPI_cursor_open() failed: %s",
   SPI_result_code_string(SPI_result));
diff --git a/src/pl/plpython/plpy_elog.c b/src/pl/plpython/plpy_elog.c
index 741980c..9909f23 100644
*** a/src/pl/plpython/plpy_elog.c
--- b/src/pl/plpython/plpy_elog.c
***
*** 12,17 
--- 12,18 
  
  #include "plpy_elog.h"
  
+ #include "plpy_main.h"
  #include "plpy_procedure.h"
  
  
*** PLy_traceback(char **xmsg, char **tbmsg,
*** 260,265 
--- 261,267 
  			char	   *line;
  			char	   *plain_filename;
  			long		plain_lineno;
+ 			PLyExecutionContext	*exec_ctx = PLy_current_execution_context();
  
  			/*
  			 * The second frame points at the internal function, but to mimick
*** PLy_traceback(char **xmsg, char **tbmsg,
*** 270,276 
  			else
  fname = PyString_AsString(name);
  
! 			proname = PLy_procedure_name(PLy_curr_procedure);
  			plain_filename = PyString_AsString(filename);
  			plain_lineno = PyInt_AsLong(lineno);
  
--- 272,280 
  			else
  fname = PyString_AsString(name);
  
! 			Assert(exec_ctx->curr_proc != NULL);
! 
! 			proname = PLy_procedure_name(exec_ctx->curr_proc);
  			plain_filename = PyString_AsString(filename);
  			plain_lineno = PyInt_AsLong(lineno);
  
*** PLy_traceback(char **xmsg, char **tbmsg,
*** 287,293 
  			 * function code object was compiled with "" as the
  			 * filename
  			 */
! 			if (PLy_curr_procedure && plain_filename != NULL &&
  strcmp(plain_filename, "") == 0)
  			{
  /*
--- 291,297 
  			 * function code object was compiled with "" as the
  			 * filename
  			 */
! 			if (exec_ctx->curr_proc && plain_filename != N

Re: [HACKERS] leakproof

2012-02-19 Thread Robert Haas

On Sun, Feb 19, 2012 at 5:29 PM, Andrew Dunstan  wrote:
> I missed all the fun while the "leakproof" addition to function attributes
> was being decided, so I know I'm late to the party. Today I had to go and
> look up what it actually meant. I have to say that I was a bit surprised. I
> expected it to refer to memory management in some way. I don't honestly
> think "leakproof" as a term is going to convey much to lots of people. Can
> we come up with a more descriptive term?

We bikeshed on that topic a while back and nobody suggested anything
that got more than 1 or 2 votes.  But I'm still happy to rename it if
we can come up with something better, because I'm not in love with it
either.

Having now spent far too much time in bed with that patch, I'm feeling
like the concept that we are really looking for there is what some
languages call "pure" - that is, there must be no side effects,
whether by throwing exceptions or otherwise.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reducing bgwriter wakeups

2012-02-19 Thread Robert Haas

On Sun, Feb 19, 2012 at 5:56 PM, Jeff Janes  wrote:
> Would the log really have been archived in 9.1?  I don't think
> checkpoint_timeout caused a log switch, just a checkpoint which could
> happily be in the same file as the previous checkpoint.

The log segment doesn't need to get archived - it's sufficient that
the dirty buffers get written to disk.

>> In 9.2, it may well be that
>> xlog contains the only record of that transaction, and you're hosed.
>> The more work we do to postpone writing the data until the absolutely
>> last possible moment, the more likely it is that it won't be on disk
>> when we need it.
>
> Isn't that what archive_timeut is for?
>
> Should archive_timeout default to something like 5 min, rather than 0?

I dunno.  I think people are doing replication are probably mostly
using streaming replication these days, in which case archive_timeout
won't matter one way or the other.  But if you're not doing
replication, your only hope of recovering from a trashed pg_xlog is
that PostgreSQL wrote the buffers and (in the case of an OS crash) the
OS wrote them to disk.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 16-bit page checksums for 9.2

2012-02-19 Thread Robert Haas

On Sun, Feb 19, 2012 at 6:33 PM, Simon Riggs  wrote:
> On Sun, Feb 19, 2012 at 10:04 PM, Robert Haas  wrote:
>> To me, it seems that you are applying a double standard.  You have
>> twice attempted to insist that I do extra work to make major features
>> that I worked on - unlogged tables and index-only scans - work in Hot
>> Standby mode, despite the existence of significant technological
>> obstacles.  But when it comes to your own feature, you simply state
>> that it cannot be done, and therefore we need not do it.   Of course,
>> this feature, like those, CAN be made to work.
>
> Vitriol aside, If you would be so kind as to explain how it is
> possible, as you claim, I'll look into making it work.

It would require a double-write buffer of some kind.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Future of our regular expression code

2012-02-19 Thread Brendan Jurd

On 20 February 2012 10:42, Tom Lane  wrote:
> I have also got
> a bunch of text about the colormap management code, which I think
> is interesting right now because that is what we are going to have
> to fix if we want decent performance for Unicode \w and related
> classes (cf the other current -hackers thread about regexes).
> I was hoping to prevail on you to pick that part up as your first
> project.  I will commit what I've got in a few minutes --- look
> for src/backend/regex/README in that commit.

Okay, I've read through your README content, it was very helpful.
I'll now go chew through some more reading material and then start
studying our existing regex source code.  Once I'm firing on all
cylinders with this stuff, I'll begin to tackle the colormap.

Cheers,
BJ

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Future of our regular expression code

2012-02-19 Thread Billy Earney

Tom,

I did a google search, and found the following:
http://www.arglist.com/regex/

Which states that Tcl uses the same library from Henry.  Maybe someone
involved with that project would help explain the library?  Also I noticed
at the url above is a few ports people did from Henry's code.  I didn't
download and analyze their code, but maybe they have made some comments
that could help, or maybe have some improvements to the code..

Just a thought.. :)

Billy Earney

On Sun, Feb 19, 2012 at 5:42 PM, Tom Lane  wrote:

> Brendan Jurd  writes:
> > Are you far enough into the backrefs bug that you'd prefer to see it
> > through, or would you like me to pick it up?
>
> Actually, what I've been doing today is a brain dump.  This code is
> never going to be maintainable by anybody except its original author
> without some internals documentation, so I've been trying to write
> some based on what I've managed to reverse-engineer so far.  It's
> not very complete, but I do have some words about the DFA/NFA stuff,
> which I will probably revise and fill in some more as I work on the
> backref fix, because that's where that bug lives.  I have also got
> a bunch of text about the colormap management code, which I think
> is interesting right now because that is what we are going to have
> to fix if we want decent performance for Unicode \w and related
> classes (cf the other current -hackers thread about regexes).
> I was hoping to prevail on you to pick that part up as your first
> project.  I will commit what I've got in a few minutes --- look
> for src/backend/regex/README in that commit.  I encourage you to
> add to that file as you figure stuff out.  We could stand to upgrade
> a lot of the code comments too, of course, but I think a narrative
> description is pretty useful before diving into code.
>
>regards, tom lane
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

Re: [HACKERS] 16-bit page checksums for 9.2

2012-02-19 Thread Robert Haas

On Sun, Feb 19, 2012 at 6:57 PM, Simon Riggs  wrote:
> I thought it was a reasonable and practical idea from Heikki. The bit
> is not selected arbitrarily, it is by design adjacent to one of the
> other bits. So overall, 3 bits need to be set to a precise value and a
> run of 1s or 0s will throw and error.

Sure, but who is to say that a typical error will look anything like
that?  Anyway, you could check even more bits just as easily; we know
exactly which ones have a plausible reason for being non-zero.

>> That having been said, I don't feel very good about the idea of
>> relying on the contents of the page to tell us whether or not the page
>> has a checksum.  There's no guarantee that an error that flips the
>> has-checksum bit will flip any other bit on the page, or that it won't
>> flip everything else we're relying on as a backstop in exactly the
>> manner that foils whatever algorithm we put in place.  Random
>> corruption is, perhaps, unlikely to do that, but somehow I feel like
>> it'll happen more often than random chance suggests.  Things never
>> fail the way you want them to.
>
> You're right. This patch is not the best possible world, given a clean
> slate. But we don't have a clean slate.
>
> The fact is this patch will detect corruptions pretty well and that's
> what Postgres users want.
>
> While developing this, many obstacles could have been blockers. I
> think we're fairly lucky that I managed to find a way through the
> minefield of obstacles.

I think we could do worse than this patch, but I don't really believe
it's ready for commit.  We don't have a single performance number
showing how much of a performance regression this causes, either on
the master or on the standby, on any workload, much less those where a
problem is reasonably forseeable from the design you've chosen.  Many
controversial aspects of the patch, such as the way you're using the
buffer header spinlocks as a surrogate for x-locking the buffer, or
the right way of obtaining the bit-space the patch requires, haven't
really been discussed, and to the extent that they have been
discussed, they have not been agreed.

On the former point, you haven't updated
src/backend/storage/buffer/README at all; but updating is not by
itself sufficient.  Before we change the buffer-locking rules, we
ought to have some discussion of whether it's OK to do that, and why
it's necessary.  "Why it's necessary" would presumably include
demonstrating that the performance of the straightforward
implementation stinks, and that with this change is better.  You
haven't made any effort to do that whatsoever, or if you have, you
haven't posted the results here.  I'm pretty un-excited by that
change, first because I think it's a modularity violation and possibly
unsafe, and second because I believe we've already got a problem with
buffer header spinlock contention which this might exacerbate.

>> Another disadvantage of the current scheme is that there's no
>> particularly easy way to know that your whole cluster has checksums.
>> No matter how we implement checksums, you'll have to rewrite every
>> table in the cluster in order to get them fully turned on.  But with
>> the current design, there's no easy way to know how much of the
>> cluster is actually checksummed.
>
> You can read every block and check.

Using what tool?

>> If you shut checksums off, they'll
>> linger until those pages are rewritten, and there's no easy way to
>> find the relations from which they need to be removed, either.
>
> We can't have it both ways. Either we have an easy upgrade, or we
> don't. I'm told that an easy upgrade is essential.

Easy upgrade and removal of checksums are unrelated issues AFAICS.

>> I'm tempted to suggest a relation-level switch: when you want
>> checksums, you use ALTER TABLE to turn them on, and when you don't
>> want them any more you use ALTER TABLE to shut them off again, in each
>> case rewriting the table.  That way, there's never any ambiguity about
>> what's in the data pages in a given relation: either they're either
>> all checksummed, or none of them are.  This moves the decision about
>> whether checksums are enabled or disabled quite a but further away
>> from the data itself, and also allows you to determine (by catalog
>> inspection) which parts of the cluster do in fact have checksums.  It
>> might be kind of a pain to implement, though: you'd have to pass the
>> information about how any given relation was configured down to the
>> place where we validate page sanity.  I'm not sure whether that's
>> practical.
>
> It's not practical as the only mechanism, given the requirement for upgrade.

Why not?

> The version stuff is catered for by the current patch.

Your patch reuses the version number field for an unrelated purpose
and includes some vague hints of how we might do versioning using some
of the page-level flag bits.  Since bit-space is fungible, I think it
makes more sense to keep the version number where it is and ca

Re: [HACKERS] Future of our regular expression code

2012-02-19 Thread Tom Lane

Billy Earney  writes:
> I did a google search, and found the following:
> http://www.arglist.com/regex/

Hmm ... might be worth looking at those two pre-existing attempts at
making a standalone library from Henry's code, just to see what choices
they made.

> Which states that Tcl uses the same library from Henry.  Maybe someone
> involved with that project would help explain the library?

Um ... did you see the head message in this thread?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] leakproof

2012-02-19 Thread Tom Lane

Robert Haas  writes:
> On Sun, Feb 19, 2012 at 5:29 PM, Andrew Dunstan  wrote:
>> Can we come up with a more descriptive term?

> We bikeshed on that topic a while back and nobody suggested anything
> that got more than 1 or 2 votes.  But I'm still happy to rename it if
> we can come up with something better, because I'm not in love with it
> either.

> Having now spent far too much time in bed with that patch, I'm feeling
> like the concept that we are really looking for there is what some
> languages call "pure" - that is, there must be no side effects,
> whether by throwing exceptions or otherwise.

Hmm, "pure" doesn't sound bad to me.  Nice and short.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Future of our regular expression code

2012-02-19 Thread Billy Earney

Thanks Tom.  I looked at the code in the libraries I referred to earlier,
and it looks like the code in the regex directory is exactly the same as
Walter Waldo's version, which has at least one comment from the middle of
last decade (~ 2003).   Has people thought about migrating to the pcre
library?  It seems to have a lot of neat features, and also has a jit, and
it looks like it is being actively maintained and has decent comments.

On Sun, Feb 19, 2012 at 7:40 PM, Tom Lane  wrote:

> Billy Earney  writes:
> > I did a google search, and found the following:
> > http://www.arglist.com/regex/
>
> Hmm ... might be worth looking at those two pre-existing attempts at
> making a standalone library from Henry's code, just to see what choices
> they made.
>
> > Which states that Tcl uses the same library from Henry.  Maybe someone
> > involved with that project would help explain the library?
>
> Um ... did you see the head message in this thread?
>
>regards, tom lane
>

Re: [HACKERS] Future of our regular expression code

2012-02-19 Thread Stephen Frost

Billy,

* Billy Earney (billy.ear...@gmail.com) wrote:
> Thanks Tom.  I looked at the code in the libraries I referred to earlier,
> and it looks like the code in the regex directory is exactly the same as
> Walter Waldo's version, which has at least one comment from the middle of
> last decade (~ 2003).   Has people thought about migrating to the pcre
> library?  It seems to have a lot of neat features, and also has a jit, and
> it looks like it is being actively maintained and has decent comments.

It strikes me that you might benefit from reading the full thread.  As
Tom mentioned previously, pcre would require user-visible changes in
behavior, including cases where things which work today wouldn't work.
That requires a pretty high bar and I don't think we're anywhere near
there with this.

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] leakproof

2012-02-19 Thread Don Baccus

On Feb 19, 2012, at 5:42 PM, Tom Lane wrote:

> Robert Haas  writes:
>> Having now spent far too much time in bed with that patch, I'm feeling
>> like the concept that we are really looking for there is what some
>> languages call "pure" - that is, there must be no side effects,
>> whether by throwing exceptions or otherwise.
> 
> Hmm, "pure" doesn't sound bad to me.  Nice and short.
> 

Technically, "pure" is stronger than "has no side effects":

http://en.wikipedia.org/wiki/Pure_function

Result can't depend on state (for instance, database contents), either.  This 
is the typical definition used in functional programming.

gcc extends this to allow use of global variables in a "pure" function (the 
stricter definition is met by "const" functions).  PG has "immutable", so a 
slightly weaker "pure" probably wouldn't be terribly confusing given the gcc 
precedent (probably across their family of compilers).

"D" adopts the stricter definition of "pure". 

So there's some confusion around the term.

But …

I picked up this thread after "leakproof" was settled on and was curious as to 
what "leakproof" was supposed to be as I didn't read the earlier posts.  I 
assumed it meant "doesn't leak memory", which seems admirable and typical and 
not needful of an attribute on the function declaration.

"pure" is definitely less confusing IMO, if it's congruent with the weaker 
sense of "pure" that's found in some languages/implementations.

Don Baccus
http://donb.photo.net
http://birdnotes.net
http://openacs.org

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] leakproof

2012-02-19 Thread Greg Stark

I suspect this is wrong for similar reasons as "pure" but I'll throw
it out there: "hermetic"

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] leakproof

2012-02-19 Thread Tom Lane

Don Baccus  writes:
> On Feb 19, 2012, at 5:42 PM, Tom Lane wrote:
>> Hmm, "pure" doesn't sound bad to me.  Nice and short.

> Technically, "pure" is stronger than "has no side effects":
> http://en.wikipedia.org/wiki/Pure_function
> Result can't depend on state (for instance, database contents), either.  This 
> is the typical definition used in functional programming.

Well, that condition is subsumed in our idea of an immutable function.
It's not clear to me whether pure/leakproof functions are meant to be a
strict subset of immutable functions, but if they are then they meet
this stricter definition.  On the other hand, if pure/leakproof functions
don't have to be immutable but only stable, then the stricter definition
corresponds to "pure immutable".  That still doesn't sound too bad, as
long as we define our terms clearly in the docs.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Future of our regular expression code

2012-02-19 Thread Greg Stark

On Sat, Feb 18, 2012 at 6:15 PM, Tom Lane  wrote:
>  A larger point is that it'd be a real shame
> for the Spencer regex engine to die off, because it is in fact one of
> the best pieces of regex technology on the planet.
...
> Another possible long-term answer is to finish the work Henry never did,
> that is make the code into a standalone library.  That would make it
> available to more projects and perhaps attract other people to help
> maintain it.  However, that looks like a lot of work too, with distant
> and uncertain payoff.

I can't see how your first claim that the Spencer code is worth
keeping around because it's just a superior regex implementation has
much force unless we can accomplish the latter. If the library can be
split off into a standalone library then it might have some longevity.
But if we're the only ones maintaining it then it's just prolonging
the inevitable. I can't see Postgres having its own special brand of
regexes that nobody else uses being an acceptable situation forever.

One thing that concerns me more and more is that most sufficiently
powerful regex implementations are susceptible to DOS attacks. A
database application is quite likely to allow users to decide directly
or indirectly what regexes to apply and it can be hard to predict
which regexes will cause which implementations to explode its cpu or
memory requirements. We need a library that can be used to defend
against malicious regexes and i suspect neither Perl's nor Python's
library will suffice for this.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Potential reference miscounts and segfaults in plpython.c

2012-02-19 Thread Tom Lane

=?UTF-8?B?SmFuIFVyYmHFhHNraQ==?=  writes:
>> On 18/02/12 21:17, Tom Lane wrote:
>>> Dave Malcolm at Red Hat has been working on a static code analysis tool
>>> for Python-related C code.  He reports here on some preliminary results
>>> for plpython.c:
>>> https://bugzilla.redhat.com/show_bug.cgi?id=795011

> Here's a patch that fixes everything I was sure was an actual bug. The
> rest of the warnings seem to be caused by the tool not knowing that
> elog(ERROR) throws a longjmp and things like "we never unref this
> object, so it can't disappear mid-execution".

This looks pretty sane to me, but it would probably be better if one of
the more python-savvy committers took responsibility for final review.

My only comment is whether elog(ERROR) is appropriate, ie, do we consider
these to be internal errors that users will never see in practice?
If there's a significant risk of the error being thrown in the field,
it might be better to use ereport, to expose the message for
translation.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Future of our regular expression code

2012-02-19 Thread Stephen Frost

Greg,

* Greg Stark (st...@mit.edu) wrote:
> I can't see how your first claim that the Spencer code is worth
> keeping around because it's just a superior regex implementation has
> much force unless we can accomplish the latter. If the library can be
> split off into a standalone library then it might have some longevity.
> But if we're the only ones maintaining it then it's just prolonging
> the inevitable. I can't see Postgres having its own special brand of
> regexes that nobody else uses being an acceptable situation forever.
> 
> One thing that concerns me more and more is that most sufficiently
> powerful regex implementations are susceptible to DOS attacks. A
> database application is quite likely to allow users to decide directly
> or indirectly what regexes to apply and it can be hard to predict
> which regexes will cause which implementations to explode its cpu or
> memory requirements. We need a library that can be used to defend
> against malicious regexes and i suspect neither Perl's nor Python's
> library will suffice for this.

Alright, I'll bite..  Which existing regexp implementation that's well
written, well maintained, and which is well protected against malicious
regexes should we be considering then?

While we might not be able to formalize the regex code as a stand-alone
library, my bet would be that the Tcl folks (and anyone else using this
code..) will be paying attention to the changes and improvments we're
making.  Sure, it'd be easier for them to incorporate those changes if
they could just pull in a new version of the library, but we can't all
have our cake and eat it too.

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] Future of our regular expression code

2012-02-19 Thread Tom Lane

Greg Stark  writes:
> ...  We need a library that can be used to defend
> against malicious regexes and i suspect neither Perl's nor Python's
> library will suffice for this.

Yeah.  Did you read the Russ Cox papers referenced upthread?  One of the
things Google wanted was provably limited resource consumption, which
motivated them going with a pure-DFA-no-exceptions implementation.
However, they gave up backrefs to get that, which is probably a
compromise we're not willing to make.

One thing that's been bothering me for awhile is that we don't have any
CHECK_FOR_INTERRUPTS or equivalent in the library's NFA search loops.
It wouldn't be hard to add one but that'd be putting PG-specific code
into the very heart of the library, which is something I've tried to
resist.  One of the issues we'll have to face if we do try to split it
out as a standalone library is how that type of requirement can be met.
(And, BTW, that's the kind of hack that we would probably not get to
make at all with any other library, so the need for it is not evidence
that getting away from Spencer's code would be a good thing.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] wal_buffers

2012-02-19 Thread Fujii Masao

On Mon, Feb 20, 2012 at 3:08 AM, Robert Haas  wrote:
> On Sun, Feb 19, 2012 at 9:46 AM, Euler Taveira de Oliveira
>  wrote:
>> On 19-02-2012 02:24, Robert Haas wrote:
>>> I have attached tps scatterplots.  The obvious conclusion appears to
>>> be that, with only 16MB of wal_buffers, the buffer "wraps around" with
>>> some regularity
>>>
>> Isn't it useful to print some messages on the log when we have "wrap around"?
>> In this case, we have an idea that wal_buffers needs to be increased.
>
> I was thinking about that.  I think that what might be more useful
> than a log message is a counter somewhere in shared memory.  Logging
> imposes a lot of overhead, which is exactly what we don't want here,
> and the volume might be quite high on a system that is bumping up
> against this problem.  Of course then the question is... how would we
> expose the counter value?

There is no existing statistics view suitable to include such information.
What about defining pg_stat_xlog or something?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Initial 9.2 pgbench write results

2012-02-19 Thread Greg Smith


On 02/18/2012 02:35 PM, Robert Haas wrote:

I see CheckpointWriteDelay calling BgBufferSync
in 9.1.  Background writing would stop during the sync phase and
perhaps slow down a bit during checkpoint writing, but I don't think
it was stopped completely.


The sync phase can be pretty long here--that's where the worst-case 
latency figures lasting many seconds are coming from.  When checkpoints 
are happening every 60 seconds as in some of these cases, that can 
represent a decent percentage of time.  Similarly, when the OS cache 
fills, the write phase might block for a larger period of time than 
normally expected.  But, yes, you're right that my "BGW is active twice 
as much in 9.2" comments are overstating the reality here.


I'm collecting one last bit of data before posting another full set of 
results, but I'm getting more comfortable the issue here is simply 
changes in the BGW behavior.  The performance regression tracks the 
background writer maximum intensity.  I can match the original 9.1 
performance just by dropping bgwriter_lru_maxpages, in cases where TPS 
drops significantly between 9.2 and 9.1.  At the same time, some cases 
that improve between 9.1 and 9.2 perform worse if I do that.  If whether 
9.2 gains or loses compared to 9.1 is adjustable with a tunable 
parameter, with some winning and other losing at the defaults, that path 
forward is reasonable to deal with.  The fact that pgbench is an unusual 
write workload is well understood, and I can write something documenting 
this possibility before 9.2 is officially released.  I'm a lot less 
stressed that there's really a problem here now.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Initial 9.2 pgbench write results

2012-02-19 Thread Greg Smith


On 02/19/2012 05:37 AM, Simon Riggs wrote:

Please retest with wal_buffers 128MB, checkpoint_segments 1024


The test parameters I'm using aim to run through several checkpoint 
cycles in 10 minutes of time.  Bumping up against the ugly edges of 
resource bottlenecks is part of the test.  Increasing 
checkpoint_segments like that would lead to time driven checkpoints, 
either 1 or 2 of them during 10 minutes.  I'd have to increase the total 
testing time by at least 5X to get an equal workout of the system.  That 
would be an interesting data point to collect if I had a few weeks to 
focus just on that test.  I think that's more than pgbench testing 
deserves though.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] wal_buffers

2012-02-19 Thread Greg Smith


On 02/19/2012 12:24 AM, Robert Haas wrote:

I think we might want to consider
adjusting our auto-tuning formula for wal_buffers to allow for a
higher cap, although this is obviously not enough data to draw any
firm conclusions.


That's an easy enough idea to throw into my testing queue.  The 16MB 
auto-tuning upper bound was just the easiest number to suggest that was 
obviously useful and unlikely to be wasteful.  One of the reasons 
wal_buffers remains a user-visible parameter was that no one every 
really did an analysis at what its useful upper bound was--and that 
number might move up as other bottlenecks are smashed too.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Future of our regular expression code

2012-02-19 Thread Greg Smith


On 02/19/2012 10:28 PM, Greg Stark wrote:

One thing that concerns me more and more is that most sufficiently
powerful regex implementations are susceptible to DOS attacks.


There's a list of "evil regexes" at http://en.wikipedia.org/wiki/ReDoS

The Perl community's reaction to Russ Cox's regex papers has some 
interesting comments along these lines too:  
http://www.perlmonks.org/?node_id=597262


That brings up the backreferences concerns Tom already mentioned.  
Someone also points out the Thompson NFA that Cox advocates in his first 
article can use an excessive amount of memory when processing Unicode:  
http://www.perlmonks.org/?node_id=597312


Aside--Cox's "Regular Expression Matching with a Trigram Index" is an 
interesting intro to trigram use for FTS purposes, and might have some 
inspirational ideas for further progress in that area:  
http://swtch.com/~rsc/regexp/regexp4.html


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] leakproof

2012-02-19 Thread Don Baccus


On Feb 19, 2012, at 7:24 PM, Tom Lane wrote:

> Don Baccus  writes:
>> On Feb 19, 2012, at 5:42 PM, Tom Lane wrote:
>>> Hmm, "pure" doesn't sound bad to me.  Nice and short.
> 
>> Technically, "pure" is stronger than "has no side effects":
>> http://en.wikipedia.org/wiki/Pure_function
>> Result can't depend on state (for instance, database contents), either.  
>> This is the typical definition used in functional programming.
> 
> Well, that condition is subsumed in our idea of an immutable function.

Yes, I said that myself, perhaps you didn't bother to read closely?


> It's not clear to me whether pure/leakproof functions are meant to be a
> strict subset of immutable functions

Superset, not subset, unless my guessing is wrong.  How could "pure" be a 
subset of "immutable"?

OK, at this point, proponents will explain why ...

But if you're not clear as to what a "leakproof" function is meant to be. then 
I suggest the definition must be defined very clearly, so everyone understands 
what it is meant to be.

> , but if they are then they meet
> this stricter definition.  On the other hand, if pure/leakproof functions
> don't have to be immutable but only stable, then the stricter definition
> corresponds to "pure immutable".  That still doesn't sound too bad, as
> long as we define our terms clearly in the docs.

Sure, let those making the proposal make things clear.

Just speaking as a gadfly who's not posted here for probably close on 10 years …



Don Baccus
http://donb.photo.net
http://birdnotes.net
http://openacs.org







-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Future of our regular expression code

2012-02-19 Thread Jay Levitt


Stephen Frost wrote:

Alright, I'll bite..  Which existing regexp implementation that's well
written, well maintained, and which is well protected against malicious
regexes should we be considering then?


FWIW, there's a benchmark here that compares a number of regexp engines, 
including PCRE, TRE and Russ Cox's RE2:


http://lh3lh3.users.sourceforge.net/reb.shtml

The fastest backtracking-style engine seems to be oniguruma, which is native 
to Ruby 1.9 and thus not only supports Unicode but I'd bet performs pretty 
well on it, on account of it's developed in Japan.  But it goes pathological 
on regexen containing '|'; the only safe choice among PCRE-style engines is 
RE2, but of course that doesn't support backreferences.


Russ's page on re2 (http://code.google.com/p/re2/) says:

"If you absolutely need backreferences and generalized assertions, then RE2 
is not for you, but you might be interested in irregexp, Google Chrome's 
regular expression engine."


That's here:

http://blog.chromium.org/2009/02/irregexp-google-chromes-new-regexp.html

Sadly, it's in Javascript.  Seems like if you need a safe, performant regexp 
implementation, your choice is (a) finish PLv8 and support it on all 
platforms, or (b) add backreferences to RE2 and precompile it to C with 
Comeau (if that's still around), or...


Jay

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] leakproof

2012-02-19 Thread Robert Haas

On Sun, Feb 19, 2012 at 10:24 PM, Tom Lane  wrote:
> Don Baccus  writes:
>> On Feb 19, 2012, at 5:42 PM, Tom Lane wrote:
>>> Hmm, "pure" doesn't sound bad to me.  Nice and short.
>
>> Technically, "pure" is stronger than "has no side effects":
>> http://en.wikipedia.org/wiki/Pure_function
>> Result can't depend on state (for instance, database contents), either.  
>> This is the typical definition used in functional programming.
>
> Well, that condition is subsumed in our idea of an immutable function.
> It's not clear to me whether pure/leakproof functions are meant to be a
> strict subset of immutable functions, but if they are then they meet
> this stricter definition.  On the other hand, if pure/leakproof functions
> don't have to be immutable but only stable, then the stricter definition
> corresponds to "pure immutable".  That still doesn't sound too bad, as
> long as we define our terms clearly in the docs.

For the present application (pushdown into security views), we really
only care whether the function has side effects, such as throwing an
error or mutating global state.  So, in theory, even a volatile
function could be leakproof - it could read (but not write) some piece
of global, volatile state.  In practice, I'm not sure those cases are
important at all.  Right now, the only things marked as leakproof are
relational operators that might be indexable, precisely so that we
might be able to push an indexable qual down far enough to allow an
index scan, even in the presence of an intervening security view.
Maybe someone will want to push down a qual like x > now() or x >
clock_timestamp(), but I guess I can't get that excited about that.
There are so few leakproof functions that the chances of making
pushdown work safely for much of anything beyond col = const seem
remote.  So, my tea leaves are telling me that if we want to make pure
a subset of immutable, that probably isn't going to cause a problem.
However, I am not a CTLR (certified tea leaf reader).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Initial 9.2 pgbench write results

2012-02-19 Thread Robert Haas

On Sun, Feb 19, 2012 at 11:12 PM, Greg Smith  wrote:
> I'm collecting one last bit of data before posting another full set of
> results, but I'm getting more comfortable the issue here is simply changes
> in the BGW behavior.  The performance regression tracks the background
> writer maximum intensity.

That's really quite fascinating... but it seems immensely
counterintuitive.  Any idea why?  BufFreelist contention between the
background writer and regular backends leading to buffer allocation
stalls, maybe?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: XML helper functions

2012-02-19 Thread Larry

But by using the above code: how do we deal with multiple matching values?

For example:


   java 
   c++ 


In this case, perhaps I would want something like

---+-
my_question  | java
my_question  | c++



--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Proposal-XML-helper-functions-tp2018975p5497064.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Scaling XLog insertion (was Re: [HACKERS] Moving more work outside WALInsertLock)

2012-02-19 Thread Amit Kapila

I was trying to understand this patch and had few doubts:

1. In PerformXLogInsert(), why there is need to check freespace when already
during ReserveXLogInsertLocation(), 
   the space is reserved. 
   Is it possible that the record size is more than actually calculted in
ReserveXLogInsertLocation(), if so in that 
   case what I understand is it is moving to next page to write, however
isn't it possible that some other backend had 
   already reserved that space.

2. In function WaitForXLogInsertionSlotToBecomeFree(), chances are there
such that when nextslot equals lastslot, all new 
   backends try to  reserve a slot will start waiting on same last slot
which can lead to serialization for those 
   backends and can impact latency. 

3. GetXlogBuffer - This will get called twice, once for normal buffer,
second time for when there is not enough space in 
   current page, and both times it can lead to I/O whereas in earlier
algorithm, the chances of I/O is only once.

-Original Message-
From: pgsql-hackers-ow...@postgresql.org
[mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Heikki Linnakangas
Sent: Friday, February 17, 2012 9:07 PM
To: Fujii Masao
Cc: Jeff Janes; Robert Haas; PostgreSQL-development
Subject: Re: Scaling XLog insertion (was Re: [HACKERS] Moving more work
outside WALInsertLock)

On 17.02.2012 07:27, Fujii Masao wrote:
> Got another problem: when I ran pg_stop_backup to take an online 
> backup, it got stuck until I had generated new WAL record. This 
> happens because, in the patch, when pg_stop_backup forces a switch to 
> new WAL file, old WAL file is not marked as archivable until next new 
> WAL record has been inserted, but pg_stop_backup keeps waiting for that
WAL file to be archived.
> OTOH, without the patch, WAL file is marked as archivable as soon as 
> WAL file switch occurs.
>
> So, in short, the patch seems to handle the WAL file switch logic
incorrectly.

Yep. For a WAL-switch record, XLogInsert returns the location of the end of
the record, not the end of the empty padding space. So when the caller
flushed up to that point, it didn't flush the empty space and therefore
didn't notify the archiver.

Attached is a new version, fixing that, and off-by-one bug you pointed out
in the slot wraparound handling. I also moved code around a bit, I think
this new division of labor between the XLogInsert subroutines is more
readable.

Thanks for the testing!

-- 
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Displaying accumulated autovacuum cost

2012-02-19 Thread Fujii Masao

On Sat, Feb 18, 2012 at 2:16 AM, Robert Haas  wrote:
> On Fri, Feb 17, 2012 at 5:04 AM, Fujii Masao  wrote:
>> Here are review comments:
>>
>> The document about EXPLAIN needs to be updated.
>>
>> You forgot to add the long-integer-valued property of 
>> shared/local_blks_dirtied.
>> So when I ran EXPLAIN and used json as a format, no information about
>> blks_dirtied
>> was reported.
>
> Thanks for the review.  Updated patch attached.

Thanks for updating the patch!

The patch looks good to me. But I have three minor comments:

In pg_stat_statements--1.1.sql
+/* contrib/pg_stat_statements/pg_stat_statements--1.0.sql */

Typo: s/1.0/1.1

In pg_stat_statements--1.0--1.1.sql, we should complain if script is sourced
in psql, as follows?

\echo Use "ALTER EXTENSION pg_stat_statements UPDATE TO '1.1'" to
load this file. \quit

+DATA = pg_stat_statements--1.1.sql pg_stat_statements--1.0--1.1.sql \
+   pg_stat_statements--unpackaged--1.0.sql

Though I'm not familiar with CREATE EXTENSION. Why did you exclude 1.0.sql
from DATA? In hstore/Makefile, 1.0.sql is included. You think we should prevent
old version (i.e., 1.0) of pg_stat_statements from being used in 9.2?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

57 matches

Mail list logo