date:20130726

Re: [HACKERS] [GENERAL] currval and DISCARD ALL

2013-07-26 Thread Fabrízio de Royes Mello


On 25-07-2013 05:32, suresh.balasubra wrote:

Disclaimer: I am no hacker, just a PostGreSQL user, trying to provide a user
scenario where DISCARD SEQUENCES functionality is required.

We have designed a developed a small Application Development platform for
which the backend is PostGreSQL.

There is a DBLayer which is responsible in generating SQL statements for all
the INSERT, UPDATE, DELETE operations. Data can be pushed into multiple
tables using this layer. What we provide to this layer is just a DataSet
(.NET). DataTables in the DataSet will be named after their respective
tables. We also use DataColumn extended properties to push in additional
logic. All these happen with in a transaction and also in one shot, just one
hit to the DB by appending SQL statements in proper order.

There is an interesting feature that we have built into this DBlayer which
is auto linking. All tables in our system will have a serial field 'id'.
Suppose there is a master table (let us call it 'voucher') and a detail
table ('voucher_lines'), and we are using the layer to push one record to
the master table and 50 records to the detail table. 'voucher_lines' table
will have a integer column 'voucher_id'. '*_id' fields are automatically
populated with 'currval('*_id_seq'). All this works like a charm.

Now, imagine we want to push data into another table (say 'invoice') which
also has a field 'voucher_id'. This is a different activity not connected
with the above mentioned transaction. In this scenario this field will get
updated as currval('voucher_id_seq') returns a value. But we do not want
that to be updated. What we want is to resolve '*_id' fields into values
only within a transaction. After the transaction we want to get away with
the session. If we could have cleared the session someway (DISCARD ALL does
it, but not the currval sequence info) after the first transaction it would
have worked for us.



I already sent a patch to implement DISCARD SEQUENCES [1] that I expect 
to be part of 9.4 version.


Regards,

[1] https://commitfest.postgresql.org/action/patch_view?id=1171

--
   Fabrízio de Royes Mello Timbira - http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: Patch to add support of "IF NOT EXISTS" to others "CREATE" statements

2013-07-26 Thread Abhijit Menon-Sen

At 2013-07-26 10:39:00 +0200, karl...@gmail.com wrote:
>
> Hello, as I can see there are more inconsistent places.

Right. This is what I was referring to in my original review. All of the
relevant sites (pre-patch) that currently do:

if (already exists)
ereport(ERROR …)

should instead be made to do:

if (already exists)
{
if (ifNotExists)
{
ereport(NOTICE …)
return
}

ereport(ERROR …)
}

or even (very slightly easier to review):

if (already exists && ifNotExists)
{
ereport(ERROR …)
return
}

if (already exists)
ereport(ERROR …)

I don't care much which of the two is used, so long as it's (a) the same
everywhere, and (b) there's no "if (!ifNot" anywhere.

-- Abhijit


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GENERAL] postgres FDW cost estimation options unrecognized in 9.3-beta1

2013-07-26 Thread Lonni J Friedman

On Fri, Jul 26, 2013 at 3:28 PM, Tom Lane  wrote:
> Lonni J Friedman  writes:
>> nightly=# ALTER SERVER cuda_db10 OPTIONS (SET use_remote_estimate 'true') ;
>> ERROR:  option "use_remote_estimate" not found
>
>> Am I doing something wrong, or is this a bug?
>
> [ experiments... ]  You need to say ADD, not SET, to add a new option to
> the list.  SET might more appropriately be spelled REPLACE, because it
> requires that the object already have a defined value for the option,
> which will be replaced.
>
> Our documentation appears not to disclose this fine point, but a look
> at the SQL-MED standard says it's operating per spec.  The standard also
> says that ADD is an error if the option is already defined, which is a
> bit more defensible, but still not exactly what I'd call user-friendly.
> And the error we issue for that case is pretty misleading too:
>
> regression=# ALTER SERVER cuda_db10 OPTIONS (use_remote_estimate 'true') ;
> ALTER SERVER
> regression=# ALTER SERVER cuda_db10 OPTIONS (use_remote_estimate 'false') ;
> ERROR:  option "use_remote_estimate" provided more than once
>
> I think we could do with both more documentation, and better error
> messages for these cases.  In the SET-where-you-should-use-ADD case,
> perhaps
>
> ERROR:  option "use_remote_estimate" has not been set
> HINT: Use ADD not SET to define an option that wasn't already set.
>
> In the ADD-where-you-should-use-SET case, perhaps
>
> ERROR:  option "use_remote_estimate" is already set
> HINT: Use SET not ADD to change an option's value.
>
> The "provided more than once" wording would be appropriate if the same
> option is specified more than once in the command text, but I'm not sure
> that it's worth the trouble to detect that case.
>
> Thoughts, better wordings?

Thanks Tom, I've confirmed that using ADD was the solution.  I think
your suggested updated ERROR & HINT text is an excellent improvement.
It definitely would have given me the clue I was missing earlier.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GENERAL] postgres FDW cost estimation options unrecognized in 9.3-beta1

2013-07-26 Thread Tom Lane

Lonni J Friedman  writes:
> nightly=# ALTER SERVER cuda_db10 OPTIONS (SET use_remote_estimate 'true') ;
> ERROR:  option "use_remote_estimate" not found

> Am I doing something wrong, or is this a bug?

[ experiments... ]  You need to say ADD, not SET, to add a new option to
the list.  SET might more appropriately be spelled REPLACE, because it
requires that the object already have a defined value for the option,
which will be replaced.

Our documentation appears not to disclose this fine point, but a look
at the SQL-MED standard says it's operating per spec.  The standard also
says that ADD is an error if the option is already defined, which is a
bit more defensible, but still not exactly what I'd call user-friendly.
And the error we issue for that case is pretty misleading too:

regression=# ALTER SERVER cuda_db10 OPTIONS (use_remote_estimate 'true') ;
ALTER SERVER
regression=# ALTER SERVER cuda_db10 OPTIONS (use_remote_estimate 'false') ;
ERROR:  option "use_remote_estimate" provided more than once

I think we could do with both more documentation, and better error
messages for these cases.  In the SET-where-you-should-use-ADD case,
perhaps

ERROR:  option "use_remote_estimate" has not been set
HINT: Use ADD not SET to define an option that wasn't already set.

In the ADD-where-you-should-use-SET case, perhaps

ERROR:  option "use_remote_estimate" is already set
HINT: Use SET not ADD to change an option's value.

The "provided more than once" wording would be appropriate if the same
option is specified more than once in the command text, but I'm not sure
that it's worth the trouble to detect that case.

Thoughts, better wordings?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: ALTER SYSTEM SET command to change postgresql.conf parameters (RE: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review])

2013-07-26 Thread Stephen Frost

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Come to think of it, maybe part of the reason we're having such a hard
> time getting to consensus is that people are conflating the "snippet"
> part with the "writable" part?  I mean, if you are thinking you want
> system-management tools to be able to drop in configuration fragments as
> separate files, there's a case to be made for a conf.d subdirectory that
> lives somewhere that the postmaster can't necessarily write.  We just
> mustn't confuse that with support for ALTER SYSTEM SET.  I strongly
> believe that ALTER SYSTEM SET must not be designed to write anywhere
> outside $PGDATA.

Agreed.  To continue that thought, I find it *very* unlikely that a
given environment would use *both* a tool like puppet to manage the
files in their conf.d *and* have people using ALTER SYSTEM SET.  You're
going to do one or the other, almost certainly; not the least of which
is because those are very likely two different teams and only one of
them is going to be responsible for the PG system config.

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] getting rid of SnapshotNow

2013-07-26 Thread Robert Haas

On Fri, Jul 26, 2013 at 10:16 AM, Andres Freund  wrote:
> I am more concerned about the care needed when placing
> CommandCounterIncrement()'s somewhere though. It seems more than likely
> that this will get repeatedly broken, even if it's not atm (which I
> doubt). E.g. inheritance handling seems to be rather wonky WRT this.

There may well be bugs.  I am fine with reviewing patches to improve
the code in this area, but I don't plan to take it upon myself to
rewrite that code.  Either it's working as expected, or nobody's using
it, because we're not getting any bug reports.

>> Not that I really object if someone wants to have a go at getting rid
>> of SnapshotSelf, but I think it'd be worth articulating what we hope
>> to accomplish by so doing.
>
> Agreed. From the internal usages there doesn't seem to be too much
> pressure.

So unless there are objections to the patch as posted, I'm going to
apply that next week.  This in no way precludes more work in this area
later, but since we're likely to break third-party code with this
change, we might as well get it out of the way as early in the release
cycle as possible.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade -j broken on Windows

2013-07-26 Thread Bruce Momjian

On Fri, Jul 26, 2013 at 01:27:34PM -0400, Bruce Momjian wrote:
> On Thu, Jul 25, 2013 at 10:57:28AM -0400, Bruce Momjian wrote:
> > Everyone should be aware that the 9.3 pg_upgrade -j/--jobs option on
> > Windows is currently broken, and hopefully will be fixed by the next
> > beta.
> > 
> > Someone at PGDay UK told me they were getting pg_upgrade -j crashes on
> > Windows.  Andrew Dunstan was able to reproduce the crash, and that has
> > been fixed, but there is still a race condition that I am trying to
> > diagnose.
> 
> After three days of testing and creating a Windows MinGW build
> environment (with Andrew's help), I have come up with the attached patch
> which fixes the pg_upgrade -j race condition on Windows.  In summary,
> creating a file with fopen() from a non-primary thread and then calling
> system() soon after can result in a file-in-use error.  The solution was
> to issue the fopen() after system() in such cases.
> 
> This would be applied to head and 9.3.

Sorry, patch attached.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +
diff --git a/contrib/pg_upgrade/exec.c b/contrib/pg_upgrade/exec.c
new file mode 100644
index ef123a8..3ec6e1c
*** a/contrib/pg_upgrade/exec.c
--- b/contrib/pg_upgrade/exec.c
*** static void validate_exec(const char *di
*** 21,26 
--- 21,27 
  
  #ifdef WIN32
  static int	win32_check_directory_write_permissions(void);
+ DWORD   mainThreadId = 0;
  #endif
  
  
*** static int	win32_check_directory_write_p
*** 37,48 
   * If throw_error is true, this raises a PG_FATAL error and pg_upgrade
   * terminates; otherwise it is just reported as PG_REPORT and exec_prog()
   * returns false.
   */
  bool
  exec_prog(const char *log_file, const char *opt_log_file,
  		  bool throw_error, const char *fmt,...)
  {
! 	int			result;
  	int			written;
  
  #define MAXCMDLEN (2 * MAXPGPATH)
--- 38,51 
   * If throw_error is true, this raises a PG_FATAL error and pg_upgrade
   * terminates; otherwise it is just reported as PG_REPORT and exec_prog()
   * returns false.
+  *
+  * The code requires it be called first from the primary thread on Windows.
   */
  bool
  exec_prog(const char *log_file, const char *opt_log_file,
  		  bool throw_error, const char *fmt,...)
  {
! 	int			result = 0;
  	int			written;
  
  #define MAXCMDLEN (2 * MAXPGPATH)
*** exec_prog(const char *log_file, const ch
*** 50,55 
--- 53,64 
  	FILE	   *log;
  	va_list		ap;
  
+ #ifdef WIN32
+ 	/* We assume we are called from the primary thread first */
+ 	if (mainThreadId == 0)
+ 		mainThreadId = GetCurrentThreadId();
+ #endif
+ 
  	written = strlcpy(cmd, SYSTEMQUOTE, sizeof(cmd));
  	va_start(ap, fmt);
  	written += vsnprintf(cmd + written, MAXCMDLEN - written, fmt, ap);
*** exec_prog(const char *log_file, const ch
*** 61,66 
--- 70,91 
  	if (written >= MAXCMDLEN)
  		pg_log(PG_FATAL, "command too long\n");
  
+ 	pg_log(PG_VERBOSE, "%s\n", cmd);
+ 
+ #ifdef WIN32
+ 	/*
+ 	 * For some reason, Windows issues a file-in-use error if we write data
+ 	 * to the log file from a non-primary thread just before we create a
+ 	 * subprocess that also writes to the same log file.  One fix is to
+ 	 * sleep for 100ms.  A cleaner fix is to write to the log file _after_
+ 	 * the subprocess has completed, so we do this only when writing from
+ 	 * a non-primary thread.  fflush(), running system() twice, and
+ 	 * pre-creating the file do not see to help.
+ 	 */
+ 	if (mainThreadId != GetCurrentThreadId())
+ 		result = system(cmd);
+ #endif
+ 
  	log = fopen(log_file, "a");
  
  #ifdef WIN32
*** exec_prog(const char *log_file, const ch
*** 84,94 
  
  	if (log == NULL)
  		pg_log(PG_FATAL, "cannot write to log file %s\n", log_file);
  #ifdef WIN32
! 	fprintf(log, "\n\n");
  #endif
- 	pg_log(PG_VERBOSE, "%s\n", cmd);
  	fprintf(log, "command: %s\n", cmd);
  
  	/*
  	 * In Windows, we must close the log file at this point so the file is not
--- 109,126 
  
  	if (log == NULL)
  		pg_log(PG_FATAL, "cannot write to log file %s\n", log_file);
+ 
  #ifdef WIN32
! 	/* Are we printing "command:" before its output? */
! 	if (mainThreadId == GetCurrentThreadId())
! 		fprintf(log, "\n\n");
  #endif
  	fprintf(log, "command: %s\n", cmd);
+ #ifdef WIN32
+ 	/* Are we printing "command:" after its output? */
+ 	if (mainThreadId != GetCurrentThreadId())
+ 		fprintf(log, "\n\n");
+ #endif
  
  	/*
  	 * In Windows, we must close the log file at this point so the file is not
*** exec_prog(const char *log_file, const ch
*** 96,102 
  	 */
  	fclose(log);
  
! 	result = system(cmd);
  
  	if (result != 0)
  	{
--- 128,138 
  	 */
  	fclose(log);
  
! #ifdef WIN32
! 	/* see comment above */
! 	if (mainThreadId == GetCurrentThreadId())
! #endif
! 		result = system(cmd);
  
  	if (result != 0)
  	{
*** e

Re: [HACKERS] pg_upgrade -j broken on Windows

2013-07-26 Thread Bruce Momjian

On Thu, Jul 25, 2013 at 10:57:28AM -0400, Bruce Momjian wrote:
> Everyone should be aware that the 9.3 pg_upgrade -j/--jobs option on
> Windows is currently broken, and hopefully will be fixed by the next
> beta.
> 
> Someone at PGDay UK told me they were getting pg_upgrade -j crashes on
> Windows.  Andrew Dunstan was able to reproduce the crash, and that has
> been fixed, but there is still a race condition that I am trying to
> diagnose.

After three days of testing and creating a Windows MinGW build
environment (with Andrew's help), I have come up with the attached patch
which fixes the pg_upgrade -j race condition on Windows.  In summary,
creating a file with fopen() from a non-primary thread and then calling
system() soon after can result in a file-in-use error.  The solution was
to issue the fopen() after system() in such cases.

This would be applied to head and 9.3.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Condition to become the standby mode.

2013-07-26 Thread Fujii Masao

On Fri, Jul 26, 2013 at 11:55 PM, Andres Freund  wrote:
> On 2013-07-26 23:47:59 +0900, Fujii Masao wrote:
>> >> If this problem is solved, there is possible of that we can failback
>> >> by removing the all WAL record which is in pg_xlog before server
>> >> starts as the slave server.
>> >> ( And we also use "synchronous_transfer" which I'm proposing, I think
>> >> we can fail-back without taking full backup surely)
>> >
>> > I still have *massive* doubts about the concept. But anyway, if you want
>> > to do so, you should generate a backup label that specifies the startup
>> > location.
>>
>> Generating a backup label doesn't seem to be enough because there is
>> no backup-end WAL record and we cannot know the consistency point.
>
> Since 9.2 we allow generation of base backups from standbys, the
> infrastructure built for should be sufficient to pass the lsn at which
> consistency is achieved.

Yeah, right. I'd forgotten about that.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Design proposal: fsync absorb linear slider

2013-07-26 Thread didier

Hi,


On Fri, Jul 26, 2013 at 3:41 PM, Greg Smith
(needrant.com
> wrote:

> On 7/26/13 9:14 AM, didier wrote:
>
>> During recovery you have to load the log in cache first before applying
>> WAL.
>>
>
> Checkpoints exist to bound recovery time after a crash.  That is their
> only purpose.  What you're suggesting moves a lot of work into the recovery
> path, which will slow down how long it takes to process.
>
> Yes it's slower but you're sequentially reading only one file at most the
size of your buffer cache, moreover it's a constant time.

Let say you make a checkpoint and crash just after with a next to empty
WAL.

Now recovery  is very fast but you have to repopulate your cache with
random reads from requests.

With the snapshot it's slower but you read, sequentially again, a lot of
hot cache you will need later when the db starts serving requests.

Of course the worst case is if it crashes just before a checkpoint, most of
the snapshot data are stalled and will be overwritten by WAL ops.

But  If the WAL recovery is CPU bound, loading from the snapshot may be
done concurrently while replaying the WAL.

More work at recovery time means someone who uses the default of
> checkpoint_timeout='5 minutes', expecting that crash recovery won't take
> very long, will discover it does take a longer time now.  They'll be forced
> to shrink the value to get the same recovery time as they do currently.
>  You might need to make checkpoint_timeout 3 minutes instead, if crash
> recovery now has all this extra work to deal with.  And when the time
> between checkpoints drops, it will slow the fundamental efficiency of
> checkpoint processing down.  You will end up writing out more data in the
> end.
>
Yes it's a trade off, now you're paying the price at checkpoint time, every
time,  with the log you're paying only once, at recovery.

>
> The interval between checkpoints and recovery time are all related.  If
> you let any one side of the current requirements slip, it makes the rest
> easier to deal with.  Those are all trade-offs though, not improvements.
>  And this particular one is already an option.
>
> If you want less checkpoint I/O per capita and don't care about recovery
> time, you don't need a code change to get it.  Just make checkpoint_timeout
> huge.  A lot of checkpoint I/O issues go away if you only do a checkpoint
> per hour, because instead of random writes you're getting sequential ones
> to the WAL.  But when you crash, expect to be down for a significant chunk
> of an hour, as you go back to sort out all of the work postponed before.

It's not the same  it's a snapshot saved and loaded in constant time unlike
the WAL log.

Didier

Re: [HACKERS] Condition to become the standby mode.

2013-07-26 Thread Andres Freund

On 2013-07-26 23:47:59 +0900, Fujii Masao wrote:
> >> If this problem is solved, there is possible of that we can failback
> >> by removing the all WAL record which is in pg_xlog before server
> >> starts as the slave server.
> >> ( And we also use "synchronous_transfer" which I'm proposing, I think
> >> we can fail-back without taking full backup surely)
> >
> > I still have *massive* doubts about the concept. But anyway, if you want
> > to do so, you should generate a backup label that specifies the startup
> > location.
> 
> Generating a backup label doesn't seem to be enough because there is
> no backup-end WAL record and we cannot know the consistency point.

Since 9.2 we allow generation of base backups from standbys, the
infrastructure built for should be sufficient to pass the lsn at which
consistency is achieved.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Condition to become the standby mode.

2013-07-26 Thread Fujii Masao

On Fri, Jul 26, 2013 at 3:08 PM, Andres Freund  wrote:
> Hi,
>
> On 2013-07-26 13:19:34 +0900, Sawada Masahiko wrote:
>> When the slave server starts, the slave server perform the following
>> steps in StartupXLOG():
>> 1. Read latest CheckPoint record LSN from pg_control file.
>> 2. Try to fetch CheckPoint record from pg_xlog directory at first.
>>   ( The server try to read up to prior CheckPoint record from pg_contrl file)
>> 3. If there is not in pg_xlog, the slave server requests CheckPoint
>> record to the primary server.
>
>> in #3, it works only when StandbyMode is true.　For StandbyMode is to
>> true, database cluster state should be "DB_SHUTDOWNED" (it is one of
>> the conditions).
>> that is, slave server can try to fetch checkpoint record from the
>> master server after slave server was successful completion.
>
> It also will fetch the xlog from remote when we've found a backup label
> (the read_backup_label() branch in StartupXLOG() will set StandbyMode to
> true). Which will be the case if we're recovering from something like a
> base backup.
>
> The reason we don't fetch from remote if there's no backup label *and*
> the last checkpoint wasn't a shutdown checkpoint is that that's doing
> *local* crash recovery.

Yes, in that case, local crash recovery occurs first. After it ends (i.e.,
reach the end of the WAL available in local), we can enable standby
mode and fetch the WAL from remote server.

>> If this problem is solved, there is possible of that we can failback
>> by removing the all WAL record which is in pg_xlog before server
>> starts as the slave server.
>> ( And we also use "synchronous_transfer" which I'm proposing, I think
>> we can fail-back without taking full backup surely)
>
> I still have *massive* doubts about the concept. But anyway, if you want
> to do so, you should generate a backup label that specifies the startup
> location.

Generating a backup label doesn't seem to be enough because there is
no backup-end WAL record and we cannot know the consistency point.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] getting rid of SnapshotNow

2013-07-26 Thread Andres Freund

On 2013-07-26 09:50:32 -0400, Robert Haas wrote:
> sepgsql is using SnapshotSelf to find the old version of a tuple that
> was updated by the core code just before.  That should be safe in the
> sense that there can't be a currently-committing transaction somewhere
> else that's updated that tuple, if we know that our own uncommitted
> transaction has done a transactional update.  There was a recent
> thread discussing whether another API might be better, and I'd be
> prepared to concede that it might be.  But I don't think it's
> drop-dead broken.

It's safe for the tuples updated in that transaction, but it's not safe
to look at anything else if you expect results without the SnapshotNow
problems. E.g. looking at a newly created attribute is fine, but
iterating over all attributes not necessarily.

I am more concerned about the care needed when placing
CommandCounterIncrement()'s somewhere though. It seems more than likely
that this will get repeatedly broken, even if it's not atm (which I
doubt). E.g. inheritance handling seems to be rather wonky WRT this.

> Not that I really object if someone wants to have a go at getting rid
> of SnapshotSelf, but I think it'd be worth articulating what we hope
> to accomplish by so doing.

Agreed. From the internal usages there doesn't seem to be too much
pressure.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] getting rid of SnapshotNow

2013-07-26 Thread Tom Lane

Robert Haas  writes:
> On Fri, Jul 26, 2013 at 9:18 AM, Tom Lane  wrote:
>> Robert Haas  writes:
>>> Well, that's still used in _bt_check_unique, unique_key_recheck
>>> (trigger function to do a deferred uniqueness check), RI_FKey_check,
>>> and rather extensively by sepgsql.  I don't really have much desire to
>>> do the work to get rid of it, though.

>> Hm.  I agree the first three may be all right, but I can't help
>> suspecting that sepgsql is doing the wrong thing here.

> sepgsql is using SnapshotSelf to find the old version of a tuple that
> was updated by the core code just before.

Oh.  OK, then it reduces to the same case as the other three, ie we're
looking at tuples we know to be update-locked.

> [ interesting ruminations snipped ]

Yeah, removing SnapshotNow catalog access certainly opens the doors
for a lot of new thinking.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Design proposal: fsync absorb linear slider

2013-07-26 Thread Greg Smith


On 7/26/13 8:32 AM, Tom Lane wrote:

What I'd point out is that that is exactly what WAL does for us, ie
convert a bunch of random writes into sequential writes.  But sooner or
later you have to put the data where it belongs.


Hannu was observing that SSD often doesn't do that at all.  They can 
maintain logical -> physical translation tables that decode where each 
block was written to forever.  When read seeks are really inexpensive, 
the only pressure to reorder block is wear leveling.


That doesn't really help with regular drives though, where the low seek 
time assumption doesn't play out so well.  The whole idea of writing 
things sequentially and then sorting them out later was the rage in 2001 
for ext3 on Linux, as part of the "data=journal" mount option.  You can 
go back and see that people are confused but excited about the 
performance at 
http://www.ibm.com/developerworks/linux/library/l-fs8/index.html


Spoiler:  if you use a workload that has checkpoint issues, it doesn't 
help PostgreSQL latency.  Just like using a large write cache, you gain 
some burst performance, but eventually you pay for it with extra latency 
somewhere.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] getting rid of SnapshotNow

2013-07-26 Thread Robert Haas

On Fri, Jul 26, 2013 at 9:18 AM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Fri, Jul 26, 2013 at 8:49 AM, Tom Lane  wrote:
>>> What about SnapshotSelf?
>
>> Well, that's still used in _bt_check_unique, unique_key_recheck
>> (trigger function to do a deferred uniqueness check), RI_FKey_check,
>> and rather extensively by sepgsql.  I don't really have much desire to
>> do the work to get rid of it, though.
>
> Hm.  I agree the first three may be all right, but I can't help
> suspecting that sepgsql is doing the wrong thing here.

sepgsql is using SnapshotSelf to find the old version of a tuple that
was updated by the core code just before.  That should be safe in the
sense that there can't be a currently-committing transaction somewhere
else that's updated that tuple, if we know that our own uncommitted
transaction has done a transactional update.  There was a recent
thread discussing whether another API might be better, and I'd be
prepared to concede that it might be.  But I don't think it's
drop-dead broken.

Not that I really object if someone wants to have a go at getting rid
of SnapshotSelf, but I think it'd be worth articulating what we hope
to accomplish by so doing.  For example, the btree README says the
following about the deletion algorithm:

---
...  The reason we do it is to
provide an interlock between non-full VACUUM and indexscans.  Since VACUUM
deletes index entries before deleting tuples, the super-exclusive lock
guarantees that VACUUM can't delete any heap tuple that an indexscanning
process might be about to visit.  (This guarantee works only for simple
indexscans that visit the heap in sync with the index scan, not for bitmap
scans.  We only need the guarantee when using non-MVCC snapshot rules; in
an MVCC snapshot, it wouldn't matter if the heap tuple were replaced with
an unrelated tuple at the same TID, because the new tuple wouldn't be
visible to our scan anyway.)
---

Obviously, when we were using SnapshotNow for catalog access, changing
anything here was a non-starter.  But now that we're not, it might be
worth asking whether there are few enough users of non-MVCC rules that
we could apply some suitable treatment to those that remain and then
change the locking protocol here.  And if we did do that, would there
be enough performance benefit to justify the work?  I don't have
answers to those questions, and the answer may well be that we should
leave things as they are, but I think the questions are worth thinking
about.

Aside from any possible advantage in further trimming the list of
available snapshot types, I'd like to spend some time thinking about
what we can do that's safe and useful in terms of reducing lock
levels; or even adding completely new facilities that would have been
DOA in the old world.  I have high hopes in both areas, but I wouldn't
be surprised to find that there are problems we haven't thought about
yet.  I think our dependence on SnapshotNow has wormed itself into our
design choices in deep ways, and I suspect it's going to take a good
deal of thought to figure out exactly what we can improve and how.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Design proposal: fsync absorb linear slider

2013-07-26 Thread Greg Smith


On 7/26/13 9:14 AM, didier wrote:

During recovery you have to load the log in cache first before applying WAL.


Checkpoints exist to bound recovery time after a crash.  That is their 
only purpose.  What you're suggesting moves a lot of work into the 
recovery path, which will slow down how long it takes to process.


More work at recovery time means someone who uses the default of 
checkpoint_timeout='5 minutes', expecting that crash recovery won't take 
very long, will discover it does take a longer time now.  They'll be 
forced to shrink the value to get the same recovery time as they do 
currently.  You might need to make checkpoint_timeout 3 minutes instead, 
if crash recovery now has all this extra work to deal with.  And when 
the time between checkpoints drops, it will slow the fundamental 
efficiency of checkpoint processing down.  You will end up writing out 
more data in the end.


The interval between checkpoints and recovery time are all related.  If 
you let any one side of the current requirements slip, it makes the rest 
easier to deal with.  Those are all trade-offs though, not improvements. 
 And this particular one is already an option.


If you want less checkpoint I/O per capita and don't care about recovery 
time, you don't need a code change to get it.  Just make 
checkpoint_timeout huge.  A lot of checkpoint I/O issues go away if you 
only do a checkpoint per hour, because instead of random writes you're 
getting sequential ones to the WAL.  But when you crash, expect to be 
down for a significant chunk of an hour, as you go back to sort out all 
of the work postponed before.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Extensions makefiles - coverage

2013-07-26 Thread Ronan Dunklau

Thank you for the tip, its done.

2013/7/26 Robert Haas :
> On Thu, Jul 25, 2013 at 11:07 AM, Ronan Dunklau  wrote:
>> Hello.
>>
>> I was having trouble figuring how to use the coverage targets when
>> using an extension.
>> I am using approximatively the layout that was proposed here:
>> http://www.postgresql.org/message-id/51bb1b6e.2070...@dunslane.net
>> It looks like everything is hard-coded to take the source and the
>> gcda, gcno files in the base directory, but these files lay in a src
>> directory with the proposed layout.
>>
>> It may be better to base the .gcda file discovery on the OBJS
>> variables when using PGXS.
>>
>> Please find attached a small patch that implements this. There is
>> probably a better way than the redundant rm $(gcda_files) / rm *.gcda
>> to cleanup the generated files.
>>
>> With the attached patch, the following targets seem to have the same
>> behaviour as on the current HEAD, both on the whole tree and on
>> individual contrib modules:
>>
>> - coverage-html
>> - clean
>> - coverage-clean
>> - clean-coverage
>>
>> I noticed that make clean leaves gcda and gcov files on the current
>> HEAD, and this is no different with the given patch.
>>
>> I also tested it against several pgxn extensions, and it seems to work fine.
>
> You can ensure that your patch doesn't get forgotten about by adding it here:
>
> https://commitfest.postgresql.org/action/commitfest_view/open
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] getting rid of SnapshotNow

2013-07-26 Thread Tom Lane

Robert Haas  writes:
> On Fri, Jul 26, 2013 at 8:49 AM, Tom Lane  wrote:
>> What about SnapshotSelf?

> Well, that's still used in _bt_check_unique, unique_key_recheck
> (trigger function to do a deferred uniqueness check), RI_FKey_check,
> and rather extensively by sepgsql.  I don't really have much desire to
> do the work to get rid of it, though.

Hm.  I agree the first three may be all right, but I can't help
suspecting that sepgsql is doing the wrong thing here.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Design proposal: fsync absorb linear slider

2013-07-26 Thread didier

Hi,


On Fri, Jul 26, 2013 at 11:42 AM, Greg Smith  wrote:

> On 7/25/13 6:02 PM, didier wrote:
>
>> It was surely already discussed but why isn't postresql  writing
>> sequentially its cache in a temporary file?
>>
>
> If you do that, reads of the data will have to traverse that temporary
> file to assemble their data.  You'll make every later reader pay the random
> I/O penalty that's being avoided right now.  Checkpoints are already
> postponing these random writes as long as possible. You have to take care
> of them eventually though.
>
>
> No the log file is only used at recovery time.

in check point code:
- loop over cache, marks dirty buffers with BM_CHECKPOINT_NEEDED as in
current code
- other workers can't write and evicted these marked buffers to disk,
there's a race with fsync.
- check point fsync now or after the next step.
- check point loop again save to log these buffers, clear
BM_CHECKPOINT_NEEDED but *doesn't* clear BM_DIRTY, of course many buffers
will be written again, as they are when check point isn't running.
- check point done.

During recovery you have to load the log in cache first before applying WAL.

Didier

Re: [HACKERS] Extensions makefiles - coverage

2013-07-26 Thread Robert Haas

On Thu, Jul 25, 2013 at 11:07 AM, Ronan Dunklau  wrote:
> Hello.
>
> I was having trouble figuring how to use the coverage targets when
> using an extension.
> I am using approximatively the layout that was proposed here:
> http://www.postgresql.org/message-id/51bb1b6e.2070...@dunslane.net
> It looks like everything is hard-coded to take the source and the
> gcda, gcno files in the base directory, but these files lay in a src
> directory with the proposed layout.
>
> It may be better to base the .gcda file discovery on the OBJS
> variables when using PGXS.
>
> Please find attached a small patch that implements this. There is
> probably a better way than the redundant rm $(gcda_files) / rm *.gcda
> to cleanup the generated files.
>
> With the attached patch, the following targets seem to have the same
> behaviour as on the current HEAD, both on the whole tree and on
> individual contrib modules:
>
> - coverage-html
> - clean
> - coverage-clean
> - clean-coverage
>
> I noticed that make clean leaves gcda and gcov files on the current
> HEAD, and this is no different with the given patch.
>
> I also tested it against several pgxn extensions, and it seems to work fine.

You can ensure that your patch doesn't get forgotten about by adding it here:

https://commitfest.postgresql.org/action/commitfest_view/open

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] getting rid of SnapshotNow

2013-07-26 Thread Robert Haas

On Fri, Jul 26, 2013 at 8:49 AM, Tom Lane  wrote:
> Robert Haas  writes:
>> OK.  I've taken care of all remaining uses of SnapshotNow in the code
>> base.  I think we can go ahead and remove it, now.  Patch attached.
>
>> (And there was, hopefully, much rejoicing.)
>
> What about SnapshotSelf?

Well, that's still used in _bt_check_unique, unique_key_recheck
(trigger function to do a deferred uniqueness check), RI_FKey_check,
and rather extensively by sepgsql.  I don't really have much desire to
do the work to get rid of it, though.

Getting rid of SnapshotNow is arguably important on the grounds that
third-party code may be using it, and doing this will force them to do
it the new way instead, and that's got some value.  I'm not sure if
anything's already been committed that relies on MVCC catalog access,
but several things have certainly been proposed and it's a good bet
that 9.4 will rely on the catalog access using MVCC-semantics, so
forcing third-party code to stop using SnapshotNow will prevent subtle
bugs.

But there's no similar joy for SnapshotSelf.  You can argue that all
of the things that we're doing with it are crufty, but nobody's
complaining about any of them, and some of them are in places where
the cost of an additional MVCC snapshot on every iteration might be
much more serious than anything we ever saw for catalog scans.  So I'm
personally content to leave it well enough alone.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] getting rid of SnapshotNow

2013-07-26 Thread Andres Freund

On 2013-07-25 19:24:53 -0400, Robert Haas wrote:
> (And there was, hopefully, much rejoicing.)

Definitely! Thanks.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] getting rid of SnapshotNow

2013-07-26 Thread Andres Freund

On 2013-07-26 08:49:38 -0400, Tom Lane wrote:
> Robert Haas  writes:
> > OK.  I've taken care of all remaining uses of SnapshotNow in the code
> > base.  I think we can go ahead and remove it, now.  Patch attached.
> 
> > (And there was, hopefully, much rejoicing.)
> 
> What about SnapshotSelf?

I thought about that yesterday and I think we should replace the usages
which aren't easily replaceable (constraint stuff) with an mvcc
snapshot, just one treats our transaction's current CommandId as
visible. That should be doable?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] dynamic background workers, round two

2013-07-26 Thread Robert Haas

On Fri, Jul 26, 2013 at 5:34 AM, Andres Freund  wrote:
> It doesn't need to be the postmaster, but I think we need to provide
> central infrastructure for that. I don't want this to end up being
> redone poorly in multiple places.
> I just wanted to mention it, it obviously doesn't need to be implemented
> at the same time.

OK, makes sense.  I'm still working out a design for this part of it;
I think I want to get the dynamic shared memory stuff working first.

>> >> +#define InvalidPid   (-1)
>> >> +
>> >
>> > That value strikes me as a rather bad idea. As a pid that represents
>> > init's processgroup. I.e. kill(InvalidPid) will try to kill init... I'd
>> > rather not spread that outside async.c.
>>
>> Well, if we ever try to kill(InvalidPid), that's a PostgreSQL bug.
>> And if the postgres process has permission to send signals to init,
>> that's an OS bug (or a PostgreSQL bug, since we refuse to run as
>> root).  So I'm not very worried about it.
>
> I don't think somebody mistakenly kill()ing somebody with -1 is all too
> likely, but it's a valid value to pass. That's why I dislike using it as
> constant for an invalid value.

Well, so is any negative number, but we should never be trying to kill
process groups.

>> I've got two functions that
>> need distinguishable return values for the not-yet-started case and
>> the already-dead case, neither of which can be real PIDs.  If we don't
>> use 0 and -1, the alternative is presumably to complicate the calling
>> signature with another out parameter.  That does not seem like a net
>> improvement.
>
> I actually think those cases would be better served by an error code
> mechanism which is extensible.

Hmm.  What signature would you suggest?

> I can see the point in the argument, but it still seems to be grotty
> architecturally.

Suck it up.  :-)

I considered the idea of using SIGUSR2, but it didn't seem worth it.

>> I don't understand what you mean by this.  Why does it ignore postmaster 
>> death?
>
> The other callers that check for WL_POSTMASTER_DEATH actually perform
> drastic measures like exit()ing upon postmaster. Here you just
> return. With the unix latch implementation it seems to be guaranteed
> that the next WaitLatch() done somewhere else will also return
> WL_POSTMASTER_DEATH. I am not sure whether it might actually "consume"
> the death in the windows implementation.

Well, IMHO, every backend should exit as quick as possible upon
discovering that the postmaster has died.  However, for complex
political reasons (ah-Tom-Lane-choo!), our policy is to have the
auxiliary processes exit and regular backends soldier on.  This seems
completely stupid to me, but it's not this patch's job to reverse that
policy decision.

To put that another way, one could equally well ask why WaitLatch(),
or for that matter PostmasterIsAlive(), doesn't contain ereport(FATAL)
upon discovering that the postmaster has given up the ghost.  And the
answer is that it's the job of those functions to provide information,
not make policy decisions.

> I don't have a problem with the restriction, but I'd like to see a check
> against it. Maybe check for MyBackendId != InvalidBackendId in
> RegisterDynamicBackgroundWorker()? That would also prevent starting
> further bgworkers before BackgroundWorkerInitializeConnection() is done
> in a connected bgworker which seems to be a good thing.

Well, that's easy enough to fix.  Should we Assert() or elog() or what?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] getting rid of SnapshotNow

2013-07-26 Thread Tom Lane

Robert Haas  writes:
> OK.  I've taken care of all remaining uses of SnapshotNow in the code
> base.  I think we can go ahead and remove it, now.  Patch attached.

> (And there was, hopefully, much rejoicing.)

What about SnapshotSelf?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: ALTER SYSTEM SET command to change postgresql.conf parameters (RE: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review])

2013-07-26 Thread Tom Lane

Alvaro Herrera  writes:
> The main contention point I see is where conf.d lives;
> the two options are in $PGDATA or together with postgresql.conf.  Tom
> and Robert, above, say it should be in $PGDATA; but this goes against
> Debian packaging and the Linux FHS (or whatever that thing is called).

Ordinarily, if postgresql.conf is not in $PGDATA, it will be somewhere
that the postmaster does not (and should not) have write permissions
for.  I have no objection to inventiny a conf.d subdirectory, I just say
that it must be under $PGDATA.  The argument that this is against FHS
is utter nonsense, because anything we write there is not static
configuration, it's just data.

Come to think of it, maybe part of the reason we're having such a hard
time getting to consensus is that people are conflating the "snippet"
part with the "writable" part?  I mean, if you are thinking you want
system-management tools to be able to drop in configuration fragments as
separate files, there's a case to be made for a conf.d subdirectory that
lives somewhere that the postmaster can't necessarily write.  We just
mustn't confuse that with support for ALTER SYSTEM SET.  I strongly
believe that ALTER SYSTEM SET must not be designed to write anywhere
outside $PGDATA.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Design proposal: fsync absorb linear slider

2013-07-26 Thread Tom Lane

Greg Smith  writes:
> On 7/26/13 5:59 AM, Hannu Krosing wrote:
>> Well, SSD disks do it in the way proposed by didier (AFAIK), by putting
>> "random"
>> fs pages on one large disk page and having an extra index layer for
>> resolving
>> random-to-sequential ordering.

> If your solution to avoiding random writes now is to do sequential ones 
> into a buffer, you'll pay for it by having more expensive random reads 
> later.

What I'd point out is that that is exactly what WAL does for us, ie
convert a bunch of random writes into sequential writes.  But sooner or
later you have to put the data where it belongs.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] inconsistent state after crash recovery

2013-07-26 Thread Tom Lane

Andres Freund  writes:
> On 2013-07-26 13:33:13 +0900, Satoshi Nagayasu wrote:
>> Is this expected or acceptable?

> I'd say it's both.

Postgres is built on the assumption that the underlying filesystem is
reliable, ie, once you've successfully fsync'd some data that data won't
disappear.  If the filesystem fails to honor that contract, it's a
filesystem bug not a Postgres bug.  Nor is it reasonable to expect
Postgres to be able to detect every such violation.  As an example,
would you expect crash recovery to notice the disappearance of a file
that was touched nowhere in the replayed actions?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: ALTER SYSTEM SET command to change postgresql.conf parameters (RE: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review])

2013-07-26 Thread Amit Kapila

On Friday, July 26, 2013 10:35 AM Alvaro Herrera wrote:
> Josh Berkus escribió:
> > On 07/25/2013 02:02 PM, Tom Lane wrote:
> > > Robert Haas  writes:
> > >
> > >> My thought is that people might put postgresql.conf in a directory
> > >> that only contains configuration files and isn't writeable by the
> > >> postgres user.  So I would expect to find postgresql.auto.conf in
> the
> > >> data directory always.
> > >
> > > Yeah, exactly.  I think putting it anywhere but $PGDATA is a bad
> idea,
> > > and a sysadmin who thinks he knows better probably doesn't.
> >
> > Please see Greg Smith's fairly strong arguments for putting it in the
> > config/ directory.
> 
> As far as I see, there are two argument he makes:
> 
> 1. We ought to have conf.d/ (not config/) so that tools have a way to
>deploy snippets (e.g. pgtune)
> 
> 2. we ought to have all the config files together so that versioning
>tools (Puppet) can just say "keep all files within directory X
>versioned" and not have to care about specific file names, etc.
> 
> 
> I can buy (1), because that's a pretty common design for daemons
> nowadays.  But I think that's its own patch, and there's no reason that
> this patch should be messing with this.  I don't care all that much
> about (2), but I have no problem with doing that.
> 
> So we could have two patches, first one that introduces a conf.d subdir
> that's automatically parsed after postgresql.conf, and another one that
> implements ALTER SYSTEM by using a file within conf.d.  The reason I
> say
> we need a separate patch for conf.d is that I think it'd be easier to
> argue about it in isolation, than having it be entangled with ALTER
> SYSTEM stuff.  The main contention point I see is where conf.d lives;
> the two options are in $PGDATA or together with postgresql.conf.  Tom
> and Robert, above, say it should be in $PGDATA; but this goes against
> Debian packaging and the Linux FHS (or whatever that thing is called).

Sure, I think we can split into 2 patches, but I doubt it will make it any
easier
to get this feature completed. 
The contention point can delay the first patch (automatic parse of conf.d)
which can
lead to further delay of ALTER SYSTEM SET patch and that will eventually
lead to its death.

With Regards,
Amit Kapila.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Design proposal: fsync absorb linear slider

2013-07-26 Thread Greg Smith


On 7/26/13 5:59 AM, Hannu Krosing wrote:

Well, SSD disks do it in the way proposed by didier (AFAIK), by putting
"random"
fs pages on one large disk page and having an extra index layer for
resolving
random-to-sequential ordering.


If your solution to avoiding random writes now is to do sequential ones 
into a buffer, you'll pay for it by having more expensive random reads 
later.  In the SSD write buffer case, that works only because those 
random reads are very cheap.  Do the same thing on a regular drive, and 
you'll be paying a painful penalty *every* time you read in return for 
saving work *once* when you write.  That only makes sense when your 
workload is near write-only.


It's possible to improve on this situation by having some sort of 
background process that goes back and cleans up the random data, 
converting it back into sequentially ordered writes again.  SSD 
controllers also have firmware that does this sort of work, and Postgres 
might do it as part of vacuum cleanup.  But note that such work faces 
exactly the same problems as writing the data out in the first place.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] install libpq.dll in bin directory on Windows / Cygwin

2013-07-26 Thread marco atzeri


Il 7/25/2013 11:02 PM, Alvaro Herrera ha scritto:

Andrew Dunstan wrote:

Jeff Janes asked me about this, and Bruce just tripped up on it.
Usually on Windows it's necessary to have libpq.dll/cygpq.dll either
in the PATH or in the same directory as client .exe files. The
buildfarm client has for many years simply copied this dll from the
installation lib to the installation bin directory after running
"make install". But I can't really see why we don't do that as part
of "make install" anyway. I haven't tested but I think something
like this patch would achieve this goal - it would fix something
that's tripped a lot of people up over the years.


Seems a reasonable workaround for a silly platform bug.  Do you need to
patch the MSVC stuff as well?


Comments? If we do this, should it be backported?


To 9.3, sure, but not further back, as there's probably little point.


on cygwin no need to go before 9.3 .
We already moved them during the package build as workaround.




+ifneq (,$findstring($(PORTNAME), win32 cygwin))
+   $(INSTALL_DATA) $(shlib) '$(DESTDIR)$(bindir)/$(shlib)'
+endif


I wonder if someday we will create a win64 $(PORTNAME) value, or
something like that, and then we'll have to update the port list on
which these hacks are applied all over the place.  Not complaining
about this patch in particular, just an idle thought.



Andrew,
are you planning to include also

http://www.postgresql.org/message-id/e1uqoyt-0007qc...@wrigleys.postgresql.org
http://www.postgresql.org/message-id/51b59794.3000...@gmail.com

to get rid of dllwrap ?

Regards
Marco



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Design proposal: fsync absorb linear slider

2013-07-26 Thread Hannu Krosing

On 07/26/2013 11:42 AM, Greg Smith wrote:
> On 7/25/13 6:02 PM, didier wrote:
>> It was surely already discussed but why isn't postresql  writing
>> sequentially its cache in a temporary file?
>
> If you do that, reads of the data will have to traverse that temporary
> file to assemble their data.  
In case of crash recovery, a sequential reading of this file could be
performed as first step.

this should work fairly well in most cases, at least when the recovery
shared_buffers is not smaller
than the latest run of checkpoint-written dirty buffers.




-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Design proposal: fsync absorb linear slider

2013-07-26 Thread Hannu Krosing

On 07/26/2013 11:42 AM, Greg Smith wrote:
> On 7/25/13 6:02 PM, didier wrote:
>> It was surely already discussed but why isn't postresql  writing
>> sequentially its cache in a temporary file?
>
> If you do that, reads of the data will have to traverse that temporary
> file to assemble their data.  You'll make every later reader pay the
> random I/O penalty that's being avoided right now.  Checkpoints are
> already postponing these random writes as long as possible. You have
> to take care of them eventually though.
>
Well, SSD disks do it in the way proposed by didier (AFAIK), by putting
"random"
fs pages on one large disk page and having an extra index layer for
resolving
random-to-sequential ordering.

I would not dismiss the idea without more tests and discussion.

We could have a system where "checkpoint" does sequential writes of dirty
wal buffers to alternating synced holding files (a "checkpoint log" :) )
and only background writer does random writes with no forced sync at all


-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Design proposal: fsync absorb linear slider

2013-07-26 Thread Greg Smith


On 7/25/13 6:02 PM, didier wrote:

It was surely already discussed but why isn't postresql  writing
sequentially its cache in a temporary file?


If you do that, reads of the data will have to traverse that temporary 
file to assemble their data.  You'll make every later reader pay the 
random I/O penalty that's being avoided right now.  Checkpoints are 
already postponing these random writes as long as possible. You have to 
take care of them eventually though.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] dynamic background workers, round two

2013-07-26 Thread Andres Freund

On 2013-07-25 12:35:30 -0400, Robert Haas wrote:
> On Wed, Jul 24, 2013 at 5:34 PM, Andres Freund  wrote:
> > This seems like a sensible idea to me. But, in the context of dynamic
> > query, don't we also need the reverse infrastructure of notifying a
> > bgworker that the client, that requested it to be started, has died?
> > Ending up with a dozen bgworkers running because the normal backend
> > FATALed doesn't seem nice.
> 
> I think that will be desirable for many, although not all, uses of
> background workers.  For example, you might want to be able to do
> something like SELECT pg_background('CREATE INDEX CONCURRENTLY ...')
> and then exit your session and have the background worker continue to
> run.  Or maybe somebody writes a trigger that starts a background
> worker (if there's not one already working) and queues up work for the
> trigger, and the background worker continues processing the queue
> until it is empty, and then exits.

> But for parallel query specifically, yeah, we'll probably want the
> untimely demise of either the original backend or any of its workers
> to result in the death of all the workers and an abort of the parent's
> transaction.  However, there's no need for the postmaster to be
> involved in any of that.  For example, once the worker process is up
> and running and the parent has its PID, it can use an on_shmem_exit
> hook or a PG_ENSURE_ERROR_CLEANUP() block to send SIGTERM to any
> still-living workers.

It doesn't need to be the postmaster, but I think we need to provide
central infrastructure for that. I don't want this to end up being
redone poorly in multiple places.
I just wanted to mention it, it obviously doesn't need to be implemented
at the same time.

> >> +#define InvalidPid   (-1)
> >> +
> >
> > That value strikes me as a rather bad idea. As a pid that represents
> > init's processgroup. I.e. kill(InvalidPid) will try to kill init... I'd
> > rather not spread that outside async.c.
> 
> Well, if we ever try to kill(InvalidPid), that's a PostgreSQL bug.
> And if the postgres process has permission to send signals to init,
> that's an OS bug (or a PostgreSQL bug, since we refuse to run as
> root).  So I'm not very worried about it.

I don't think somebody mistakenly kill()ing somebody with -1 is all too
likely, but it's a valid value to pass. That's why I dislike using it as
constant for an invalid value.

> I've got two functions that
> need distinguishable return values for the not-yet-started case and
> the already-dead case, neither of which can be real PIDs.  If we don't
> use 0 and -1, the alternative is presumably to complicate the calling
> signature with another out parameter.  That does not seem like a net
> improvement.

I actually think those cases would be better served by an error code
mechanism which is extensible.

> >> +/*
> >> + * Report the PID of a newly-launched background worker in shared memory.
> >> + *
> >> + * This function should only be called from the postmaster.
> >> + */
> >> +void
> >> +ReportBackgroundWorkerPID(RegisteredBgWorker *rw)
> >> +{
> >> + BackgroundWorkerSlot *slot;
> >> +
> >> + Assert(rw->rw_shmem_slot < max_worker_processes);
> >> + slot = &BackgroundWorkerData->slot[rw->rw_shmem_slot];
> >> + slot->pid = rw->rw_pid;
> >> +
> >> + if (rw->rw_worker.bgw_notify_pid != 0)
> >> + kill(rw->rw_worker.bgw_notify_pid, SIGUSR1);
> >> +}
> >
> > Any reason not to use SendProcSignal() properly here? Just that you
> > don't know the BackendId? I seems unclean to reuse SIGUSR1 without using
> > the infrastructure built for that.
> 
> We're in the postmaster here.  The postmaster currently uses kill
> directly in all cases, rather than SendProcSignal(), and I'd rather
> not change that.  Any code that the postmaster calls has got to be
> safe against arbitrary shared memory corruption, and I'd rather keep
> that footprint as small as possible.  For example, if we added
> bgw_backend_id, the postmaster would have to bounds check it to make
> sure it didn't seg fault.  The more of that kind of stuff we add, the
> more chances there are to destabilize the postmaster.  I'd like to
> keep it as minimal as possible.

I can see the point in the argument, but it still seems to be grotty
architecturally.

> >> + for (;;)
> >> + {
> >> + pid = GetBackgroundWorkerPid(handle);
> >> + if (pid != InvalidPid)
> >> + break;
> >> + rc = WaitLatch(&MyProc->procLatch,
> >> +WL_LATCH_SET | 
> >> WL_POSTMASTER_DEATH, 0);
> >> + if (rc & WL_POSTMASTER_DEATH)
> >> + return InvalidPid;
> >
> > This basically ignores postmaster death. With unix_latch.c that
> > shouldn't be a problem because afaics it's implementation correctly
> > should report an error in the next WaitLatch without waiti

Re: [HACKERS] Patch to add support of "IF NOT EXISTS" to others "CREATE" statements

2013-07-26 Thread Karol Trzcionka

W dniu 26.07.2013 02:44, Fabrízio de Royes Mello pisze:
> Should be... I fix that in attached patch.
Hello, as I can see there are more inconsistent places.
First style:
OperatorCreate
---
Second style:
ProcedureCreate
TypeCreate
DefineTSParser
DefineType
DefineEnum
---
Third style:
CreateCast
DefineTSDictionary
DefineTSTemplate
DefineTSConfiguration
DefineRange
DefineCompositeType
Regards,
Karol


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] comment for "fast promote"

2013-07-26 Thread Fujii Masao

On Fri, Jul 26, 2013 at 11:19 AM, Tomonari Katsumata
 wrote:
> Hi Fujii-san,
>
> Thank you for response.
>
>
> (2013/07/25 21:15), Fujii Masao wrote:
>> On Thu, Jul 25, 2013 at 5:33 PM, Tomonari Katsumata
>>  wrote:
>>> Hi,
>>>
>>> Now I'm seeing xlog.c in 93_stable for studying "fast promote",
>>> and I have a question.
>>>
>>> I think it has an extra unlink command for "promote" file.
>>> (on 9937 line)
>>> ---
>>> 9934 if (stat(FAST_PROMOTE_SIGNAL_FILE, &stat_buf) == 0)
>>> 9935 {
>>> 9936 unlink(FAST_PROMOTE_SIGNAL_FILE);
>>> 9937 unlink(PROMOTE_SIGNAL_FILE);
>>> 9938 fast_promote = true;
>>> 9939 }
>>> ---
>>>
>>> Is this command necesary ?
>>
>> Yes, it prevents PROMOTE_SIGNAL_FILE from remaining even if
>> both promote files exist.
>>
> The command("unlink(PROMOTE_SIGNAL_FILE)") here is for
> unusualy case.
> Because the case is when done both procedures below.
>  - user create "promote" file on PGDATA
>  - user issue "pg_ctl promote"
>
> I understand the reason.
> But I think it's better to unlink(PROMOTE_SIGNAL_FILE) before
> unlink(FAST_PROMOTE_SIGNAL_FILE).
> Because FAST_PROMOTE_SIGNAL_FILE is definetly there but
> PROMOTE_SIGNAL_FILE is sometimes there or not there.

I could not understand why that's better. Could you elaborate that?

> And I have another question linking this behavior.
> I think TriggerFile should be removed too.
> This is corner-case but it will happen.
> How do you think of it ?

I don't have strong opinion about that. I've never heard the complaint
about that current behavior so far.

>> One question is that: we really still need to support normal promote?
>> pg_ctl promote provides only way to do fast promotion. If we want to
>> do normal promotion, we need to create PROMOTE_SIGNAL_FILE
>> and send the SIGUSR1 signal to postmaster by hand. This seems messy.
>>
>> I think that we should remove normal promotion at all, or change
>> pg_ctl promote so that provides also the way to do normal promotion.
>>
> I think he merit of "fast promote" is
>  - allowing quick connection by skipping checkpoint
> and its demerit is
>  - taking little bit longer when crash-recovery
>
> If it is seldom to happen its crash soon after promoting
> and "fast promte" never breaks consistency of database cluster,
> I think we don't need normal promotion.

You can execute checkpoint after fast promotion for that.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

38 matches

Mail list logo