Re: [HACKERS] REFRESH MATERIALIZED VIEW command in PL block hitting Assert

2013-04-24 Thread Michael Paquier
On Tue, Apr 23, 2013 at 11:05 PM, Andres Freund and...@2ndquadrant.comwrote:

 On 2013-04-23 19:33:24 +0530, Jeevan Chalke wrote:
  On Tue, Apr 23, 2013 at 7:01 PM, Merlin Moncure mmonc...@gmail.com
 wrote:
 
   On Tue, Apr 23, 2013 at 7:18 AM, Jeevan Chalke
   jeevan.cha...@enterprisedb.com wrote:
Hi Tom,
   
Since we are close to release, we should not have crashes like this.
  
   huh? we are not even in beta yet
  
 
  I mean, beta release.
 
  BTW, as per Bruce's mail we are planning to package 9.3
  beta1 on *April 29*, with a release on May 2, we are close enough.

 The 2nd May date is for the release of packaged beta, not overall
 release unless I missed something major like months of testing ;)

Last year 9.2.0 was released at the beginning of September, so there would
be at least 4 months to test and improve 9.3 during beta assuming next
release is made on the same timeline.
-- 
Michael


Re: [HACKERS] Enabling Checksums

2013-04-24 Thread Simon Riggs
On 24 April 2013 01:10, Jeff Davis pg...@j-davis.com wrote:
 On Tue, 2013-04-23 at 16:28 +0100, Simon Riggs wrote:
 * make the pg_control.data_checksums field into a version number, for
 future flexibility...
 patch attached

 Commenting on this separately because it's a separate issue.

 I'd prefer that it was some kind of a checksum ID code -- e.g. 0 for no
 checksum, 1 for FNV-1a-SR3, etc. That would allow us to release 9.4 with
 a new algorithm without forcing existing users to change.

That's exactly what the patch does.

 initdb would have to take the code as an option, probably in string
 form.

When/if we have multiple options we can add that. The main thing was
to make sure the control file recorded things in a common way.

--
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fast promotion, loose ends

2013-04-24 Thread Heikki Linnakangas

On 22.04.2013 18:44, Simon Riggs wrote:

On 22 April 2013 09:29, Heikki Linnakangashlinnakan...@vmware.com  wrote:


Hmm. That requires write access to $DATADIR, so that's not quite the same
thing as the trigger_file recovery.conf option.


Well, you also (elsewhere) requested that we must keep recovery.conf
in $DATADIR, so it needs to be writable.


That's a slightly different requirement. $DATADIR must be writable by 
the process that restores the backup or puts the server into standby 
mode, while trigger_file needs to be writable by the process that 
triggers failover. Those are not necessarily the same thing. I'm 
thinking of a heartbeat process that triggers failover by creating a 
file on an NFS server or similar. Possibly the same location where the 
WAL archive is located. $DATADIR would be stored elsewhere.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fast promotion, loose ends

2013-04-24 Thread Simon Riggs
On 24 April 2013 08:23, Heikki Linnakangas hlinnakan...@vmware.com wrote:
 On 22.04.2013 18:44, Simon Riggs wrote:

 On 22 April 2013 09:29, Heikki Linnakangashlinnakan...@vmware.com
 wrote:

 Hmm. That requires write access to $DATADIR, so that's not quite the same
 thing as the trigger_file recovery.conf option.


 Well, you also (elsewhere) requested that we must keep recovery.conf
 in $DATADIR, so it needs to be writable.


 That's a slightly different requirement. $DATADIR must be writable by the
 process that restores the backup or puts the server into standby mode, while
 trigger_file needs to be writable by the process that triggers failover.
 Those are not necessarily the same thing. I'm thinking of a heartbeat
 process that triggers failover by creating a file on an NFS server or
 similar. Possibly the same location where the WAL archive is located.
 $DATADIR would be stored elsewhere.

The default you've requested is fast promotion and I've agreed to that.

The ability to write a file called promote to $DATADIR is there as a
protection in case we need it in the field, its not going to be the
primary mechanism any more. So if you're not intending to use it ever,
it doesn't seem worth discussing the fact you don't like its location.

But if you do want to discuss it, I think it's unreasonable of you to
demand recovery.conf cannot be outside $DATADIR and then also demand
related files be relocatable outside $DATADIR as well, the exact
opposite. We had the chance to avoid making $DATADIR writable
externally and that's gone now, at least for now.

Here's the patch I was intending to apply. Please let me know if you
have comments.

--
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


set_fast_promotion_as_default.v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fast promotion, loose ends

2013-04-24 Thread Heikki Linnakangas

On 24.04.2013 10:57, Simon Riggs wrote:

On 24 April 2013 08:23, Heikki Linnakangashlinnakan...@vmware.com  wrote:

On 22.04.2013 18:44, Simon Riggs wrote:


On 22 April 2013 09:29, Heikki Linnakangashlinnakan...@vmware.com
wrote:


Hmm. That requires write access to $DATADIR, so that's not quite the same
thing as the trigger_file recovery.conf option.


Well, you also (elsewhere) requested that we must keep recovery.conf
in $DATADIR, so it needs to be writable.


That's a slightly different requirement. $DATADIR must be writable by the
process that restores the backup or puts the server into standby mode, while
trigger_file needs to be writable by the process that triggers failover.
Those are not necessarily the same thing. I'm thinking of a heartbeat
process that triggers failover by creating a file on an NFS server or
similar. Possibly the same location where the WAL archive is located.
$DATADIR would be stored elsewhere.


The default you've requested is fast promotion and I've agreed to that.

The ability to write a file called promote to $DATADIR is there as a
protection in case we need it in the field, its not going to be the
primary mechanism any more. So if you're not intending to use it ever,
it doesn't seem worth discussing the fact you don't like its location.


Ok, works for me.


But if you do want to discuss it, I think it's unreasonable of you to
demand recovery.conf cannot be outside $DATADIR and then also demand
related files be relocatable outside $DATADIR as well, the exact
opposite. We had the chance to avoid making $DATADIR writable
externally and that's gone now, at least for now.


As I said above, it's a different situation. recovery.conf has always 
been in $DATADIR, and it's always been possible to point trigger_file 
elsewhere, and you've always gotten full functionality using the 
trigger_file. I just want to maintain that status quo. Which your patch 
achieves, so I'm happy that that.



Here's the patch I was intending to apply. Please let me know if you
have comments.


Regarding the change in pg_ctl:


/*
-* Use two different kinds of promotion file so we can understand
-* the difference between smart and fast promotion.
+* For 9.3 onwards, use fast promotion as the default option.
+* Promotion with a full checkpoint is still possible by writing
+* a file called promote, e.g.
+*   snprintf(promote_file, MAXPGPATH, %s/promote, pg_data);
 */
-   if (shutdown_mode = FAST_MODE)
-   snprintf(promote_file, MAXPGPATH, %s/fast_promote, pg_data);
-   else
-   snprintf(promote_file, MAXPGPATH, %s/promote, pg_data);
+   snprintf(promote_file, MAXPGPATH, %s/fast_promote, pg_data);


Should there be a version check there? I guess we've never guaranteed a 
newer pg_ctl to work with an older server version, but it seems likely 
that someone would try to do that, especially with pg_ctl promote. 
With the above change, creating $DATADIR/fast_promote in a 9.2 server's 
data dir will do nothing. I'd suggest that we keep the filename 
unchanged, promote, and only change the behavior in the server side, 
so that it performs fast promotion. If you want to have a slow promote 
file, we can call that slow_promote or checkpoint_then_promote or 
something.


Aside from that, I assume you'll clean up the now-dead -m fast option etc.

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fast promotion, loose ends

2013-04-24 Thread Simon Riggs
On 24 April 2013 09:10, Heikki Linnakangas hlinnakan...@vmware.com wrote:
 Regarding the change in pg_ctl:

 /*
 -* Use two different kinds of promotion file so we can understand
 -* the difference between smart and fast promotion.
 +* For 9.3 onwards, use fast promotion as the default option.
 +* Promotion with a full checkpoint is still possible by writing
 +* a file called promote, e.g.
 +*   snprintf(promote_file, MAXPGPATH, %s/promote, pg_data);
  */
 -   if (shutdown_mode = FAST_MODE)
 -   snprintf(promote_file, MAXPGPATH, %s/fast_promote,
 pg_data);
 -   else
 -   snprintf(promote_file, MAXPGPATH, %s/promote, pg_data);
 +   snprintf(promote_file, MAXPGPATH, %s/fast_promote, pg_data);


 Should there be a version check there? I guess we've never guaranteed a
 newer pg_ctl to work with an older server version, but it seems likely that
 someone would try to do that, especially with pg_ctl promote. With the
 above change, creating $DATADIR/fast_promote in a 9.2 server's data dir will
 do nothing. I'd suggest that we keep the filename unchanged, promote, and
 only change the behavior in the server side, so that it performs fast
 promotion. If you want to have a slow promote file, we can call that
 slow_promote or checkpoint_then_promote or something.

pg_ctl already checks versions, so I don't see the point.

--
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fast promotion, loose ends

2013-04-24 Thread Heikki Linnakangas

On 24.04.2013 11:23, Simon Riggs wrote:

On 24 April 2013 09:10, Heikki Linnakangashlinnakan...@vmware.com  wrote:

Regarding the change in pg_ctl:


 /*
-* Use two different kinds of promotion file so we can understand
-* the difference between smart and fast promotion.
+* For 9.3 onwards, use fast promotion as the default option.
+* Promotion with a full checkpoint is still possible by writing
+* a file called promote, e.g.
+*   snprintf(promote_file, MAXPGPATH, %s/promote, pg_data);
  */
-   if (shutdown_mode= FAST_MODE)
-   snprintf(promote_file, MAXPGPATH, %s/fast_promote,
pg_data);
-   else
-   snprintf(promote_file, MAXPGPATH, %s/promote, pg_data);
+   snprintf(promote_file, MAXPGPATH, %s/fast_promote, pg_data);



Should there be a version check there? I guess we've never guaranteed a
newer pg_ctl to work with an older server version, but it seems likely that
someone would try to do that, especially with pg_ctl promote. With the
above change, creating $DATADIR/fast_promote in a 9.2 server's data dir will
do nothing. I'd suggest that we keep the filename unchanged, promote, and
only change the behavior in the server side, so that it performs fast
promotion. If you want to have a slow promote file, we can call that
slow_promote or checkpoint_then_promote or something.


pg_ctl already checks versions, so I don't see the point.


The point is, if you do pgsql93/bin/pg_ctl -D $92DATADIR promote, it 
will create fast_promote file and return success. But it won't actually 
promote the server. I think that's bad.


If pg_ctl already has a check against that, fine, but I don't think it 
does. Please make sure you test that before applying.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fast promotion, loose ends

2013-04-24 Thread Simon Riggs
On 24 April 2013 09:32, Heikki Linnakangas hlinnakan...@vmware.com wrote:

 pg_ctl already checks versions, so I don't see the point.


 The point is, if you do pgsql93/bin/pg_ctl -D $92DATADIR promote, it will
 create fast_promote file and return success. But it won't actually promote
 the server. I think that's bad.

 If pg_ctl already has a check against that, fine, but I don't think it does.
 Please make sure you test that before applying.

If it doesn't check, that is not the only thing that would be broken.
The original commit of pg_ctl promote would also be broken.

--
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fast promotion, loose ends

2013-04-24 Thread Heikki Linnakangas

On 24.04.2013 11:46, Simon Riggs wrote:

On 24 April 2013 09:32, Heikki Linnakangashlinnakan...@vmware.com  wrote:


pg_ctl already checks versions, so I don't see the point.



The point is, if you do pgsql93/bin/pg_ctl -D $92DATADIR promote, it will
create fast_promote file and return success. But it won't actually promote
the server. I think that's bad.

If pg_ctl already has a check against that, fine, but I don't think it does.
Please make sure you test that before applying.


If it doesn't check, that is not the only thing that would be broken.
The original commit of pg_ctl promote would also be broken.


Yeah, it would've been good if the pg_ctl promote patch would've added 
a version check. Nevertheless, don't you think it would be good to avoid 
changing the filename of the promote file, so that we don't have any 
more such breakage? I don't see any advantage in changing it.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Failing start-up archive recovery at Standby mode in PG9.2.4

2013-04-24 Thread Andres Freund
ello,

On 2013-04-24 17:43:39 +0900, KONDO Mitsumasa wrote:
 Hi,
 
 I find problem about failing start-up achive recovery at Standby mode in 
 PG9.2.4 streaming replication.
 I test same problem in PG9.2.3. But it is not occerd...

  cp: cannot stat `../arc/00030013': そのようなファイルやディレクトリはありません
  [Standby] 2013-04-22 01:27:25 EDTLOG:  0: restored log file 
  00020013 from archive

I can't read the error message here, but this looks suspicious. The
recovery command seems to be returning success although it prints such
an error message?

  cp: cannot stat `../arc/00030014': そのようなファイルやディレクトリはありません
 [* This point is illegal work] [Standby] 2013-04-22 01:27:25 EDTLOG:  0: 
 unexpected pageaddr 0/600 in log file 0, segment 20, offset 0
  [Standby] 2013-04-22 01:27:25 EDTLOCATION:  ValidXLOGHeader, xlog.c:4395
  cp: cannot stat `../arc/00030014': そのようなファイルやディレクトリはありません
  cp: cannot stat `../arc/0004.history': そのようなファイルやディレクトリはありません
  [Master] 2013-04-22 01:27:25 EDTDEBUG:  0: received replication 
  command: IDENTIFY_SYSTEM
  [Master] 2013-04-22 01:27:25 EDTLOCATION:  HandleReplicationCommand, 
  walsender.c:449
  [Master] 2013-04-22 01:27:25 EDTDEBUG:  0: received replication 
  command: START_REPLICATION 0/1400
  [Master] 2013-04-22 01:27:25 EDTLOCATION:  HandleReplicationCommand, 
  walsender.c:449
  [Master] 2013-04-22 01:27:25 EDTFATAL:  58P01: requested WAL segment 
  00030014 has already been removed
  [Master] 2013-04-22 01:27:25 EDTLOCATION:  XLogRead, walsender.c:1023
  [Standby] 2013-04-22 01:27:25 EDTLOG:  0: streaming replication 
  successfully connected to primary
  [Standby] 2013-04-22 01:27:25 EDTLOCATION:  libpqrcv_connect, 
  libpqwalreceiver.c:171
  [Standby] 2013-04-22 01:27:25 EDTFATAL:  XX000: could not receive data from 
  WAL stream: FATAL:  requested WAL segment 00030014 has 
  already been removed
  
  [Standby] 2013-04-22 01:27:25 EDTLOCATION:  libpqrcv_receive, 
  libpqwalreceiver.c:389
  
  [mitsu-ko@localhost postgresql-9.2.4]$ bincp: cannot stat 
  `../arc/00030014': そのようなファイルやディレクトリはありません
  [Standby] 2013-04-22 01:27:30 EDTDEBUG:  0: unexpected pageaddr 
  0/600 in log file 0, segment 20, offset 0
  [Standby] 2013-04-22 01:27:30 EDTLOCATION:  ValidXLOGHeader, xlog.c:4395
  cp: cannot stat `../arc/00030014': そのようなファイルやディレクトリはありません
  cp: cannot stat `../arc/0004.history': そのようなファイルやディレクトリはありません
  [Master] 2013-04-22 01:27:30 EDTDEBUG:  0: received replication 
  command: IDENTIFY_SYSTEM
  [Master] 2013-04-22 01:27:30 EDTLOCATION:  HandleReplicationCommand, 
  walsender.c:449
  [Master] 2013-04-22 01:27:30 EDTDEBUG:  0: received replication 
  command: START_REPLICATION 0/1400
  [Master] 2013-04-22 01:27:30 EDTLOCATION:  HandleReplicationCommand, 
  walsender.c:449
  [Master] 2013-04-22 01:27:30 EDTFATAL:  58P01: requested WAL segment 
  00030014 has already been removed
  [Master] 2013-04-22 01:27:30 EDTLOCATION:  XLogRead, walsender.c:1023
  [Standby] 2013-04-22 01:27:30 EDTLOG:  0: streaming replication 
  successfully connected to primary
  [Standby] 2013-04-22 01:27:30 EDTLOCATION:  libpqrcv_connect, 
  libpqwalreceiver.c:171
  [Standby] 2013-04-22 01:27:30 EDTFATAL:  XX000: could not receive data from 
  WAL stream: FATAL:  requested WAL segment 00030014 has 
  already been removed

And this seems to be the second problem. You probably need to configure
a higher wal_keep_segments on the primary or you need to fix your
recovery_command.


 We try to fix this problem with my company's colleaguea,
 but it is too difficult and complicated to WAL contorol and validation 
 method...
 
 This problem is occerd in xlog.c:10692.
  10692 if (!ValidXLOGHeader((XLogPageHeader) readBuf, emode, false))
  10693 goto next_record_is_invalid;
 
 And problem is occerd this checking in ValidXLOGHeader().
   4390 if (!XLByteEQ(hdr-xlp_pageaddr, recaddr))
   4391 {
   4392 ereport(emode_for_corrupt_record(emode, recaddr),
   4393 (errmsg(unexpected pageaddr %X/%X in 
  log file %u, segment %u, offset %u,
   4394 
  hdr-xlp_pageaddr.xlogid, hdr-xlp_pageaddr.xrecoff,
   4395 readId, readSeg, 
  readOff)));
   4396 return false;
   4397 }
 
 We think hdr-xlp_pageaddr.xrecoff has wrong value. This is very rare cace!
 We cannot analyze why this value is uncorrect!

I think the tests is just noticing that we don't have valid WAL - we are
probably reading old preallocated wal files from before starting the
node as a standby. Normally it will get the correct data from the
primary after that (you can see the START_REPLICATION commands in the
log), but that failed.


Greetings,

Andres 

Re: [HACKERS] Fast promotion, loose ends

2013-04-24 Thread Simon Riggs
On 24 April 2013 09:53, Heikki Linnakangas hlinnakan...@vmware.com wrote:
 On 24.04.2013 11:46, Simon Riggs wrote:

 On 24 April 2013 09:32, Heikki Linnakangashlinnakan...@vmware.com
 wrote:

 pg_ctl already checks versions, so I don't see the point.



 The point is, if you do pgsql93/bin/pg_ctl -D $92DATADIR promote, it
 will
 create fast_promote file and return success. But it won't actually
 promote
 the server. I think that's bad.

 If pg_ctl already has a check against that, fine, but I don't think it
 does.
 Please make sure you test that before applying.


 If it doesn't check, that is not the only thing that would be broken.
 The original commit of pg_ctl promote would also be broken.


 Yeah, it would've been good if the pg_ctl promote patch would've added a
 version check. Nevertheless, don't you think it would be good to avoid
 changing the filename of the promote file, so that we don't have any more
 such breakage? I don't see any advantage in changing it.

Apart from all the reasons I already gave, no.

The filename isn't changing. We are adding a new capability and
changing the default.

--
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Failing start-up archive recovery at Standby mode in PG9.2.4

2013-04-24 Thread Kyotaro HORIGUCHI
Hello,

  cp: cannot stat `../arc/00030013': そのようなファイルやディレクトリはありません
  [Standby] 2013-04-22 01:27:25 EDTLOG:  0: restored log file 
  00020013 from archive

 I can't read the error message here, but this looks suspicious.

The error message is No such file or directory in Japanese :-).
Perhaps restore_command is 'cp ../arc/%f %p'.

 The recovery command seems to be returning success although it prints such
an error message?

It seems that, on 9.2.4, RestoreArchivedFile returns expected xlog
filename even if the restore command ends up in failure. And the
caller XLogFileOpen successfully ets and returns fd for the filename
if the file exists since before the restoration attempt. It does not
matter when the existing file and the file expected to be restored are
identical in content.

If the segment includes the timeline increment point, something might
happen, I suppose..

regards,

-- 
Kyotaro Horiguchi


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Failing start-up archive recovery at Standby mode in PG9.2.4

2013-04-24 Thread Kyotaro HORIGUCHI
Sorry,

 caller XLogFileOpen successfully ets and returns fd for the filename

The caller is XLogFileRead in this case.

# and 'ets' is gets, of course.

regards,
-- 
Kyotaro Horiguchi


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Failing start-up archive recovery at Standby mode in PG9.2.4

2013-04-24 Thread Andres Freund
On 2013-04-24 19:16:12 +0900, Kyotaro HORIGUCHI wrote:
 Hello,
 
   cp: cannot stat `../arc/00030013': そのようなファイルやディレクトリはありません
   [Standby] 2013-04-22 01:27:25 EDTLOG:  0: restored log file 
   00020013 from archive
 
  I can't read the error message here, but this looks suspicious.
 
 The error message is No such file or directory in Japanese :-).
 Perhaps restore_command is 'cp ../arc/%f %p'.
 
  The recovery command seems to be returning success although it prints such
 an error message?
 
 It seems that, on 9.2.4, RestoreArchivedFile returns expected xlog
 filename even if the restore command ends up in failure. And the
 caller XLogFileOpen successfully ets and returns fd for the filename
 if the file exists since before the restoration attempt. It does not
 matter when the existing file and the file expected to be restored are
 identical in content.

But thats not what happening here, afaics the restore log file ...
message is only printed if the returncode is 0.

And the return value from RestoreArchivedFile() would be different in
the case you describe which would make XLogFileRead() fail, so it should
result in different errors.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Failing start-up archive recovery at Standby mode in PG9.2.4

2013-04-24 Thread Kyotaro HORIGUCHI
Oops,

 But thats not what happening here, afaics the restore log file ...
 message is only printed if the returncode is 0.

You're right. 'cp nonexistent somewhere' exits with the status
code 1 (or 256?).

The quoted log lines simply show that segment for tli=3 did not exist
and that for tli=2 had been gotten finally. It's quite normal and
irrelevant to the trouble mentioned. Sorry for the confusion.

Unfortunately, the script attached didn't reproduce the situation for
me. But in the log attached,

 [Standby] 2013-04-22 01:27:25 EDTFATAL:  XX000: could not receive data from 
 WAL stream: FATAL:  requested WAL segment 00030014 has 
 already been removed
 cp: cannot stat `../arc/00030014': No such file or directory.
 [Standby] 2013-04-22 01:27:30 EDTDEBUG:  0: unexpected pageaddr 0/600 
 in log file 0, segment 20, offset 0

seems showing that the standby received negative ack for the request
for 20th segment, and 5 seconds later, it tried to get that from
archive and somehow thought that it'd gotten but the header is of 6th
segment.

regards,

-- 
Kyotaro Horiguchi


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSOC Student Project Idea

2013-04-24 Thread Florian Pflug
On Apr23, 2013, at 23:25 , Alexander Korotkov aekorot...@gmail.com wrote:
 I've taken a brief look on the paper and implementation. As I can see 
 iDistance implements some global building strategy. I mean, for example, it 
 selects some point, calculates distances from selected point to all points in 
 dataset etc. So, it uses the whole dataset at the same time. 
 
 However you can try to implement global index building in GiST or SP-GiST. In 
 this case I think you should carefully estimate your capabilities during 
 single GSoC project. You would need to extend GiST or SP-GiST interface and 
 write completely new implementation of tree building algorithm. Question of 
 how to exactly extend GiST or SP-GiST interface for this case could appear to 
 be very hard even theoretically.

+1. That seemed to be a major roadblock to me too when I read the paper.

You could work around that by making partition identification a separate step. 
You'd have a function

  idist_analyze(cfg name, table name, field name)

which'd identify suitable partitions for the data distribution in table.field 
and store them somewhere. Such a set of pre-identified partitions would be akin 
to a tsearch configuration, i.e. all other parts of the iDistance machinery 
would use it to map points to index keys and queries to ranges of those keys. 
You'll want to look at how tsearch handles that, and check if the method can 
indeed be applied to iDistance.

In a first cut, you'd probably only allow inserts into index which don't change 
the maximum distances from the partition centers that idist_analyze() found. 
With that restriction in place, you might get away without GiST or SP-GiSt, and 
simply use postgres' standard btree, if you find a way to map queries of the 
form field idistance-knn-operator idistance_knn_function(point, distance) 
to where (idstance_keys(field) between P1_lowerbound AND P2_upperbound) OR 
(idistance_keys(field) between P2_lowerbuild AND P2_upperbound) … or something 
equivalent. Or you could start with a GiST-based btree and extend it to support 
distance function. Looking at contrib/btree_gist and the built-in GiST 
operator class for point types should help.

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal to add --single-row to psql

2013-04-24 Thread Fabrízio de Royes Mello
On Tue, Apr 23, 2013 at 1:05 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Isn't there already a way to set FETCH_COUNT from the command line?
 (ie, I think there's a generic variable-assignment facility that could
 do this)


Christopher,

Tom is all right... from psql [1] command line we can do that:

$ bin/psql --variable=FETCH_COUNT=100
psql (9.3devel)
Type help for help.

fabrizio=# \echo :FETCH_COUNT
100

Regards,

[1]
http://www.postgresql.org/docs/current/interactive/app-psql.html#AEN84903

-- 
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
 Blog sobre TI: http://fabriziomello.blogspot.com
 Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
 Twitter: http://twitter.com/fabriziomello


Re: [HACKERS] REFRESH MATERIALIZED VIEW command in PL block hitting Assert

2013-04-24 Thread Jeevan Chalke
On Wed, Apr 24, 2013 at 3:04 AM, Kevin Grittner kgri...@ymail.com wrote:

 Jeevan Chalke jeevan.cha...@enterprisedb.com wrote:
  On Mon, Apr 22, 2013 at 6:41 PM, Andres Freund and...@2ndquadrant.com
 wrote:
  On 2013-04-22 18:35:04 +0530, Jeevan Chalke wrote:

  I have observed that following sequence is causing server crash.
 
  CREATE MATERIALIZED VIEW temp_class_mv AS
SELECT * FROM pg_class
WITH NO DATA;
 
  CREATE OR REPLACE FUNCTION test_refresh_mv()
  RETURNS int
  AS $$
  BEGIN
   REFRESH MATERIALIZED VIEW temp_class_mv;
   return 1;
  END; $$ LANGUAGE plpgsql;
 
  SELECT test_refresh_mv();
 
 
  I had a quick look over the crash and it is hitting following Assert in
  spi.c:
 
  else if (IsA(stmt, RefreshMatViewStmt))
  {
  Assert(strncmp(completionTag,
 REFRESH MATERIALIZED VIEW , 23)
 == 0);
  _SPI_current-processed = strtoul(completionTag +
 23,
NULL, 10);
  }
 
  It seems like we are missing expected value for completionTag in
  ExecRefreshMatView()

  Possibly independent from this issue, but where did that 23 come from?

 When the consensus developed to change the syntax from LOAD
 MATERIALIZED VIEW I failed to noticed the length here when making
 the changes for that.

  BTW, attached is the patch which works well for me, but need details
 review.

 I suggest that we just rip out this section of code.  Trying to
 provide a number here is probably all wrong, anyway.  As the
 features evolve, there may not be a readily accessible rowcount for
 this command in all cases.


OK.



 Any objections to the attached to fix this issue?


Nope. Fine with me.

Thanks


 --
 Kevin Grittner
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company




-- 
Jeevan B Chalke
Senior Software Engineer, RD
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Phone: +91 20 30589500

Website: www.enterprisedb.com
EnterpriseDB Blog: http://blogs.enterprisedb.com/
Follow us on Twitter: http://www.twitter.com/enterprisedb

This e-mail message (and any attachment) is intended for the use of the
individual or entity to whom it is addressed. This message contains
information from EnterpriseDB Corporation that may be privileged,
confidential, or exempt from disclosure under applicable law. If you are
not the intended recipient or authorized to receive this for the intended
recipient, any use, dissemination, distribution, retention, archiving, or
copying of this communication is strictly prohibited. If you have received
this e-mail in error, please notify the sender immediately by reply e-mail
and delete this message.


Re: [HACKERS] high io BUT huge amount of free memory

2013-04-24 Thread Robert Haas
On Tue, Apr 23, 2013 at 10:50 AM, Shaun Thomas stho...@optionshouse.com wrote:
 This is most likely a NUMA issue. There really seems to be some kind of
 horrible flaw in the Linux kernel when it comes to properly handling NUMA on
 large memory systems.

Are you referring to the fact that vm.zone_reclaim_mode = 1 is an
idiotic default?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] missing time.h include in psql/command.c since the addition of \watch

2013-04-24 Thread Andres Freund
Hi,

our internal testbuilds show a new warning on windows:
src\bin\psql\command.c(2617): warning C4013: 'time' undefined; assuming extern 
returning int 
[C:\jenkins\workspace\andres_git.postgresql.org_windows\BT\release\SL_OS\windows\TA\x86\TO\xp\psql.vcxproj]
src\bin\psql\command.c(2619): warning C4013: 'asctime' undefined; assuming 
extern returning int 
[C:\jenkins\workspace\andres_git.postgresql.org_windows\BT\release\SL_OS\windows\TA\x86\TO\xp\psql.vcxproj]
src\bin\psql\command.c(2619): warning C4013: 'localtime' undefined; assuming 
extern returning int 
[C:\jenkins\workspace\andres_git.postgresql.org_windows\BT\release\SL_OS\windows\TA\x86\TO\xp\psql.vcxproj]

Looking at command.c its neither surprising that we get that warning,
nor that we don't get it on non-windows builds. Those get time.h via
portability/instr_time.h.

So it seems we should add a time.h include to command.c. Trivial patch
for that attached.

Not sure why that warning isn't visible on the buildfarm though...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index 09939fd..2e74284 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -13,6 +13,8 @@
 #endif
 
 #include ctype.h
+#include time.h
+
 #ifdef HAVE_PWD_H
 #include pwd.h
 #endif

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.3 release notes suggestions

2013-04-24 Thread Robert Haas
On Tue, Apr 23, 2013 at 11:41 PM, Bruce Momjian br...@momjian.us wrote:
 Thanks for the many suggestions on improving the 9.3 release notes.
 There were many ideas I would have never thought of.  Please keep the
 suggestions coming.

Bruce,

Thanks for writing them!

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] high io BUT huge amount of free memory

2013-04-24 Thread Shaun Thomas

On 04/24/2013 08:24 AM, Robert Haas wrote:


Are you referring to the fact that vm.zone_reclaim_mode = 1 is an
idiotic default?


Well... it is. But even on systems where it's not the default or is 
explicitly disabled, there's just something hideously wrong with NUMA in 
general. Take a look at our numa distribution on a heavily loaded system:


available: 2 nodes (0-1)
node 0 cpus: 0 2 4 6 8 10 12 14 16 18 20 22
node 0 size: 36853 MB
node 0 free: 14315 MB
node 1 cpus: 1 3 5 7 9 11 13 15 17 19 21 23
node 1 size: 36863 MB
node 1 free: 300 MB
node distances:
node   0   1
  0:  10  20
  1:  20  10

What the hell? Seriously? Using numactl and starting in interleave 
didn't fix this, either. It just... arbitrarily ignores a huge chunk of 
memory for no discernible reason.


The memory pressure code in Linux is extremely fucked up. I can't find 
it right now, but the memory management algorithm makes some pretty 
ridiculous assumptions once you pass half memory usage, regarding what 
is in active and inactive cache.


I hate to rant, but it gets clearer to me every day that Linux is 
optimized for desktop systems, and generally only kinda works for 
servers. Once you start throwing vast amounts of memory, CPU, and 
processes at it though, things start to get unpredictable.


That all goes back to my earlier threads that disabling process 
autogrouping via the kernel.sched_autogroup_enabled setting, magically 
gave us 20-30% better performance. The optimal setting for a server is 
clearly to disable process autogrouping, and yet it's enabled by 
default, and strongly advocated by Linus himself as a vast improvement.


I get it. It's better for desktop systems. But the LAMP stack alone has 
probably a couple orders of magnitude more use cases than Joe Blow's 
Pentium 4 in his basement. Yet it's the latter case that's optimized for.


Servers are getting shafted in a lot of cases, and it's actually 
starting to make me angry.


--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-676-8870
stho...@optionshouse.com

__

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to 
this email


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] REFRESH MATERIALIZED VIEW command in PL block hitting Assert

2013-04-24 Thread Kevin Grittner
Jeevan Chalke jeevan.cha...@enterprisedb.com wrote:
 On Wed, Apr 24, 2013 at 3:04 AM, Kevin Grittner kgri...@ymail.com wrote:

 Any objections to the attached to fix this issue?

 Nope. Fine with me.

Pushed.  Thanks for the report!

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] high io BUT huge amount of free memory

2013-04-24 Thread Andres Freund
On 2013-04-24 08:39:09 -0500, Shaun Thomas wrote:
 The memory pressure code in Linux is extremely fucked up. I can't find it
 right now, but the memory management algorithm makes some pretty ridiculous
 assumptions once you pass half memory usage, regarding what is in active and
 inactive cache.
 
 I hate to rant, but it gets clearer to me every day that Linux is optimized
 for desktop systems, and generally only kinda works for servers. Once you
 start throwing vast amounts of memory, CPU, and processes at it though,
 things start to get unpredictable.

 That all goes back to my earlier threads that disabling process autogrouping
 via the kernel.sched_autogroup_enabled setting, magically gave us 20-30%
 better performance. The optimal setting for a server is clearly to disable
 process autogrouping, and yet it's enabled by default, and strongly
 advocated by Linus himself as a vast improvement.

 I get it. It's better for desktop systems. But the LAMP stack alone has
 probably a couple orders of magnitude more use cases than Joe Blow's Pentium
 4 in his basement. Yet it's the latter case that's optimized for.

IIRC there have been some scalability improvements to that code.

 Servers are getting shafted in a lot of cases, and it's actually starting to
 make me angry.

Uh. Ranting can be rather healthy thing every now and then and it good
for the soul and such. But. Did you actually try reporting those issues?
In my experience the mm and scheduler folks are rather helpful if they
see you're actually interested in fixing a problem. I have seen rants
about this topic on various pg lists for the last months but I can't
remember seeing mails on lkml about it. How should they fix what they
don't know about?

You know, before Robert got access to the bigger machine we *also* had
some very bad behaviour on them. And our writeout mechanism/buffer
acquisition mechanism still utterly sucks there. But that doesn't mean
we don't care.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] high io BUT huge amount of free memory

2013-04-24 Thread Shaun Thomas

On 04/24/2013 08:49 AM, Andres Freund wrote:


Uh. Ranting can be rather healthy thing every now and then and it good
for the soul and such. But. Did you actually try reporting those issues?


That's actually part of the problem. How do you report:

Throwing a lot of processes at a high-memory system seems to break the 
mm code in horrible ways.


I'm asking seriously here, because I have no clue how to isolate this 
behavior. It's clearly happening often enough that random people are 
starting to notice now that bigger servers are becoming the norm.


I'm also not a kernel dev in any sense of the word. My C is so rusty, I 
can barely even read the patches going through the ML. I feel 
comfortable posting to PG lists because that's my bread and butter. 
Kernel lists seem way more imposing, and I'm probably not the only one 
who feels that way.


I guess I don't mean to imply that kernel devs don't care. Maybe the 
right way to put it is that there don't seem to be enough kernel devs 
being provided with more capable testing hardware. Which is odd, 
considering Red Hat's involvement and activity on the kernel.


I apologize, though. These last few months have been really frustrating 
thanks to this and other odd kernel-related issues. We've reached an 
equilibrium where the occasional waste of 20GB of RAM doesn't completely 
cripple the system, but this thread kinda struck a sore point. :)


--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-676-8870
stho...@optionshouse.com

__

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to 
this email


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] high io BUT huge amount of free memory

2013-04-24 Thread Andres Freund
On 2013-04-24 09:06:39 -0500, Shaun Thomas wrote:
 On 04/24/2013 08:49 AM, Andres Freund wrote:
 
 Uh. Ranting can be rather healthy thing every now and then and it good
 for the soul and such. But. Did you actually try reporting those issues?
 
 That's actually part of the problem. How do you report:
 
 Throwing a lot of processes at a high-memory system seems to break the mm
 code in horrible ways.

Well. Report memory distribution. Report perf profiles. Ask *them* what
information they need. They aren't grumpy if you are behaving
sensibly. YMMV of course.

 I'm asking seriously here, because I have no clue how to isolate this
 behavior. It's clearly happening often enough that random people are
 starting to notice now that bigger servers are becoming the norm.
 
 I'm also not a kernel dev in any sense of the word. My C is so rusty, I can
 barely even read the patches going through the ML. I feel comfortable
 posting to PG lists because that's my bread and butter. Kernel lists seem
 way more imposing, and I'm probably not the only one who feels that way.

I can understand that. But you had to jump over the fence to post here
once as well ;). Really, report it and see what comes out. The worst
that can happen is that you get a grumpy email ;)
And in the end, jumping might ease the pain in the long run considerably
even if its uncomfortable at first...

Feel free to CC me.

 I guess I don't mean to imply that kernel devs don't care. Maybe the right
 way to put it is that there don't seem to be enough kernel devs being
 provided with more capable testing hardware. Which is odd, considering Red
 Hat's involvement and activity on the kernel.

There are quite some people using huge servers, but that doesn't imply
they are seing the same problems. During testing they mostly use a set
of a few benchmarks (part of which is pgbench btw) and apparently they
don't show this problem. Also this is horribly workload and hardware
dependent. There are enough people happily using postgres on linux on
far bigger hardware than what you reported upthread.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Allowing parallel pg_restore from pipe

2013-04-24 Thread Timothy Garnett
Hi All,

Currently the -j option to pg_restore, which allows for parallelization in
the restore, can only be used if the input file is a regular file and not,
for ex., a pipe.  However this is a pretty common occurrence for us
(usually in the form of pg_dump | pg_restore to copy an individual database
or some tables thereof from one machine to another).  While there's no good
way to parallelize the data load steps when reading from a pipe, the index
and constraint building can still be parallelized and as they are generally
CPU bound on our machines we've found quite a bit of speedup from doing so.

Attached is two diffs off of the REL9_2_4 tag that I've been using.  The
first is a simple change that serially loads the data section before
handing off the remainder of the restore to the existing parallelized
restore code (the .ALT. diff).  The second which gets more parallelization
but is a bit more of a change uses the existing dependency analysis code to
allow index building etc. to occur in parallel with data loading. The data
loading tasks are still performed serially in the main thread, but non-data
loading tasks are scheduled in parallel as their dependencies are satisfied
(with the caveat that the main thread can only dispatch new tasks between
data loads).

Anyways, the question is if people think this is generally useful.  If so I
can clean up the preferred choice a bit and rebase it off of master, etc.

Tim


0003-patch-pg_restore-to-allow-parallel-restore-when-the.ALT.patch
Description: Binary data


0003-patch-pg_restore-to-allow-parallel-restore-when-the-.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] missing time.h include in psql/command.c since the addition of \watch

2013-04-24 Thread Heikki Linnakangas

On 24.04.2013 16:27, Andres Freund wrote:

Hi,

our internal testbuilds show a new warning on windows:
src\bin\psql\command.c(2617): warning C4013: 'time' undefined; assuming extern 
returning int 
[C:\jenkins\workspace\andres_git.postgresql.org_windows\BT\release\SL_OS\windows\TA\x86\TO\xp\psql.vcxproj]
src\bin\psql\command.c(2619): warning C4013: 'asctime' undefined; assuming 
extern returning int 
[C:\jenkins\workspace\andres_git.postgresql.org_windows\BT\release\SL_OS\windows\TA\x86\TO\xp\psql.vcxproj]
src\bin\psql\command.c(2619): warning C4013: 'localtime' undefined; assuming 
extern returning int 
[C:\jenkins\workspace\andres_git.postgresql.org_windows\BT\release\SL_OS\windows\TA\x86\TO\xp\psql.vcxproj]

Looking at command.c its neither surprising that we get that warning,
nor that we don't get it on non-windows builds. Those get time.h via
portability/instr_time.h.

So it seems we should add a time.h include to command.c. Trivial patch
for that attached.


Thanks, applied.

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] putting a bgworker to rest

2013-04-24 Thread Dimitri Fontaine
Andres Freund and...@2ndquadrant.com writes:
 How would postmaster know when to restart a worker that stopped?

 I had imagined we would assign some return codes special
 meaning. Currently 0 basically means restart immediately, 1 means
 crashed, wait for some time, everything else results in a postmaster
 restart. It seems we can just assign returncode 2 as done, probably
 with some enum or such hiding the numbers.

In Erlang, the lib that cares about such things in called OTP, and that
proposes a model of supervisor that knows when to restart a worker. The
specs for the restart behaviour are:

  Restart = permanent | transient | temporary

Restart defines when a terminated child process should be restarted.

  - A permanent child process is always restarted.

  - A temporary child process is never restarted (not even when the
supervisor's restart strategy is rest_for_one or one_for_all and a
sibling's death causes the temporary process to be terminated).

  - A transient child process is restarted only if it terminates
abnormally, i.e. with another exit reason than normal, shutdown or
{shutdown,Term}.

Then about restart frequency, what they have is:

The supervisors have a built-in mechanism to limit the number of
restarts which can occur in a given time interval. This is
determined by the values of the two parameters MaxR and MaxT in the
start specification returned by the callback function [ ... ]

If more than MaxR number of restarts occur in the last MaxT seconds,
then the supervisor terminates all the child processes and then
itself.

You can read the whole thing here:

http://www.erlang.org/doc/design_principles/sup_princ.html#id71215

I think we should get some inspiration from them here.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.3 Beta1 status report

2013-04-24 Thread Heikki Linnakangas

On 24.04.2013 06:22, Bruce Momjian wrote:

On Tue, Apr 23, 2013 at 06:56:34PM -0300, Alvaro Herrera wrote:

Bruce Momjian wrote:

On Tue, Apr 23, 2013 at 05:04:15PM -0400, Bruce Momjian wrote:

Do we usually repeat the changes listed in the backwards
compatibility section later, in the Changes section? If not, then
instead of the first two items above, let's just have these in the
backwards-compatibility section:


We do not repeat the incompatibile items later in release notes.  I have
added some of your text to one of the items, and added a mention that
tooling will need adjustment.  Please check the post-commit version and
let me know about adjustments.


Let me clarify --- changes to our WAL binary format and source code
changes are not really incompatibilities from a user perspective as we
never promise to do our best to minimize such changes  --- m eaning the
fact the WAL format changed is something that would be only in the
source code section and not in the incompatibilities section  ---
incompatibilities are related to change in user experience or
documented-API changes.


These guidelines makes sense, except I think the change in naming
standard of xlog segments is important to document clearly because, even
if we have not promised compatibility, people could be relying on it in
scripts.  I think it makes sense to waste a couple of lines documenting
this change, even if we expect the number of people to be bitten by it
to be very low.


Right. Kevin mentioned he had a script that knew about the numbering: 
http://www.postgresql.org/message-id/4fd09b5e022500048...@gw.wicourts.gov.



Agreed.  Here is the new text:

 Store WAL in a continuous stream, rather than skipping the last
 16MB segment every 4GB (Heikki Linnakangas)  BACKWARD COMPATIBLE BREAK

 Previously, WAL files ending in FF were not used.  If you have
 WAL backup or restore scripts that took that skipping into account,
 they need to be adjusted.


Looks good, thanks!

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] putting a bgworker to rest

2013-04-24 Thread Robert Haas
On Tue, Apr 23, 2013 at 1:22 PM, Andres Freund and...@2ndquadrant.com wrote:
 So a done worker would never be restarted, until postmaster sees a
 crash or is itself restarted?  I guess that'd be useful for workers
 running during recovery, which terminate when recovery completes.  Is
 that your use case?

 Well, its not actual postgres recovery, but something similar in the
 context of logical replication.

It's probably too late to be twiddling this very much more, but
another thing I think would be useful is for backends to have the
ability to request that the postmaster start a worker of type xyz,
rather than having the server start it automatically at startup time.
That's what you'd need for parallel query, and there might be some
replication-related use cases for such things as well.  The general
usage pattern would be:

- regular backend realizes that it needs help
- kicks postmaster to start a helper process
- helper process runs for a while, doing work
- helper process finishes work, maybe waits around for some period of
time to see if any new work arrives, and then exits
- eventually go back to step 1

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Examples Re: [HACKERS] Bug Fix: COLLATE with multiple ORDER BYs in aggregates

2013-04-24 Thread David Fetter
On Tue, Apr 23, 2013 at 09:57:27AM -0700, David Fetter wrote:
 Folks,
 
 While testing the upcoming FILTER clause for aggregates, Erik Rijkers
 uncovered a long-standing bug in $subject, namely that this case
 wasn't handled.  Please find attached a patch by Andrew Gierth and
 myself which fixes this issue and adds a regression test to ensure it
 remains fixed.

Please see below results when I run the regression test query on git
master, REL_9_2_STABLE, and REL9_1_STABLE, respectively:

$ psql postgres
psql (9.3devel)
Type help for help.

shackle@postgres:5493=# SELECT array_agg(a COLLATE C ORDER BY b COLLATE 
POSIX) FROM (VALUES ('foo','bar')) v(a,b);
ERROR:  collation mismatch between explicit collations C and POSIX
LINE 1: SELECT array_agg(a COLLATE C ORDER BY b COLLATE POSIX) F...
  ^

$ psql postgres
psql (9.2.4)
Type help for help.

shackle@postgres:5492=# SELECT array_agg(a COLLATE C ORDER BY b COLLATE 
POSIX) FROM (VALUES ('foo','bar')) v(a,b);
ERROR:  collation mismatch between explicit collations C and POSIX
LINE 1: SELECT array_agg(a COLLATE C ORDER BY b COLLATE POSIX) F...
  ^

$ psql postgres
psql (9.1.9)
Type help for help.

shackle@postgres:5491=# SELECT array_agg(a COLLATE C ORDER BY b COLLATE 
POSIX) FROM (VALUES ('foo','bar')) v(a,b);
ERROR:  collation mismatch between explicit collations C and POSIX
LINE 1: SELECT array_agg(a COLLATE C ORDER BY b COLLATE POSIX) F...
  ^
Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re[2]: [HACKERS] high io BUT huge amount of free memory

2013-04-24 Thread Миша Тюрин

  thanks a lot for responses
 
1) just remind my case

Intel 32 core = 2*8 *2threads
Linux 2.6.32-5-amd64 #1 SMP Sun May 6 04:00:17 UTC 2012 x86_64 GNU/Linux
PostgreSQL 9.2.2 on x86_64-unknown-linux-gnu, compiled by gcc-4.4.real (Debian 
4.4.5-8) 4.4.5, 64-bit
shared_buffers 64GB / constant hit rate - 99,18
max_connections 160 / with pgbouncer pools there could not be more than 120 
connections at all
work_mem 32M
checkpoint 1h 1.0
swap off
numa off, interleaving on 
and
! disks usage 100% (free 128GB! WHY?)
disk throughput - up-to 30MB/s (24r+6w)
io - up-to 2,5-3K/s (0,5w + 2-2,5r)
typical work load - pk-index-scans
my warm work set is about 400GB
db at all - 700GB

2) numactl 
mtyurin@avi-sql09:~$ numactl --hardware
available: 1 nodes (0-0)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 
25 26 27 28 29 30 31
node 0 size: 393181 MB
node 0 free: 146029 MB
node distances:
node   0
  0:  10

3) !! i just found suspicious relation between active processes and free 
memory. ~1GB per process.
in 376GB total memmory and 32 core
if ( user cpu + io wait ) is ~140% then i have ~140GB free.
but it could be just a coincidence.

4) now i think
a) upgrade linux core (to 3.2!?) and then (if case still will be)
b) set buffers to something like 300-320Gb

5) what do you know about workload in Berkus's case 
http://www.databasesoup.com/2012/04/red-hat-kernel-cache-clearing-issue.html ?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re[3]: [HACKERS] high io BUT huge amount of free memory

2013-04-24 Thread Миша Тюрин

typo

 if ( user cpu + io wait ) is ~140% then i have ~140GB free.

140% === 1400%

if ~14 cores are busy then ~140GB is free

10GB per process

hmmm...



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [HACKERS] Re[3]: [HACKERS] high io BUT huge amount of free memory

2013-04-24 Thread Миша Тюрин
vm state

root@avi-sql09:~# /sbin/sysctl -a|grep vm
vm.overcommit_memory = 0
vm.panic_on_oom = 0
vm.oom_kill_allocating_task = 0
vm.oom_dump_tasks = 0
vm.overcommit_ratio = 50
vm.page-cluster = 3
vm.dirty_background_ratio = 10
vm.dirty_background_bytes = 0
vm.dirty_ratio = 20
vm.dirty_bytes = 0
vm.dirty_writeback_centisecs = 500
vm.dirty_expire_centisecs = 3000
vm.nr_pdflush_threads = 0
vm.swappiness = 0
vm.nr_hugepages = 0
vm.hugetlb_shm_group = 0
vm.hugepages_treat_as_movable = 0
vm.nr_overcommit_hugepages = 0
vm.lowmem_reserve_ratio = 256   256 32
vm.drop_caches = 0
vm.min_free_kbytes = 65536
vm.percpu_pagelist_fraction = 0
vm.max_map_count = 65530
vm.laptop_mode = 0
vm.block_dump = 0
vm.vfs_cache_pressure = 100
vm.legacy_va_layout = 0
vm.zone_reclaim_mode = 0
vm.min_unmapped_ratio = 1
vm.min_slab_ratio = 5
vm.stat_interval = 1
vm.mmap_min_addr = 65536
vm.numa_zonelist_order = default
vm.scan_unevictable_pages = 0
vm.memory_failure_early_kill = 0
vm.memory_failure_recovery = 1

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSOC Student Project Idea

2013-04-24 Thread Michael Schuh
On Wed, Apr 24, 2013 at 5:31 AM, Florian Pflug f...@phlo.org wrote:

 On Apr23, 2013, at 23:25 , Alexander Korotkov aekorot...@gmail.com
 wrote:
  I've taken a brief look on the paper and implementation. As I can see
 iDistance implements some global building strategy. I mean, for example, it
 selects some point, calculates distances from selected point to all points
 in dataset etc. So, it uses the whole dataset at the same time.
 
  However you can try to implement global index building in GiST or
 SP-GiST. In this case I think you should carefully estimate your
 capabilities during single GSoC project. You would need to extend GiST or
 SP-GiST interface and write completely new implementation of tree building
 algorithm. Question of how to exactly extend GiST or SP-GiST interface for
 this case could appear to be very hard even theoretically.

 +1. That seemed to be a major roadblock to me too when I read the paper.

 You could work around that by making partition identification a separate
 step. You'd have a function

   idist_analyze(cfg name, table name, field name)

 which'd identify suitable partitions for the data distribution in
 table.field and store them somewhere. Such a set of pre-identified
 partitions would be akin to a tsearch configuration, i.e. all other parts
 of the iDistance machinery would use it to map points to index keys and
 queries to ranges of those keys. You'll want to look at how tsearch handles
 that, and check if the method can indeed be applied to iDistance.

 In a first cut, you'd probably only allow inserts into index which don't
 change the maximum distances from the partition centers that
 idist_analyze() found. With that restriction in place, you might get away
 without GiST or SP-GiSt, and simply use postgres' standard btree, if you
 find a way to map queries of the form field idistance-knn-operator
 idistance_knn_function(point, distance) to where (idstance_keys(field)
 between P1_lowerbound AND P2_upperbound) OR (idistance_keys(field) between
 P2_lowerbuild AND P2_upperbound) … or something equivalent.



Thank you both for the very helpful feedback. Perhaps the scope of this
project (application's completeness criteria) is better as
a feasibility prototyping of the global/distance-based index strategy with
B+-tree and/or GiST extension possibilities.

Let me briefly elaborate on the current implementation details related to
your comments. The public C++ version uses a stand-alone STX B+-tree
implementation, and has only been used in the research context of one-time
data loading and indexing and then KNN retrieval performance efficiency.
This requires an upfront global partition building (sequential look at all
data points) and a bit of overhead information about the partitions (such
as reference point locations and maximum distances in each). Of course,
there are a lot of improvements, modifications, variants, etc., that can be
built from this basic setup... but that's mostly my in-progress thesis work
yet published or defended :)

A given query is broken down into range(s) of key values in the B+-tree,
based on the negligible overhead info kept from partitioning. Only then
does this small subset of pages need to be loaded from disk. where the
partitions are located with respect to the query. Therefore it is necessary
to have left/right leaf pointers within the B+-tree. While I think a
DBMS-integrated implementation would be more ideal for general use by
everyone, my naive assumption is that I could probably implement the
idistance functionality in stored procedures and metadata tables (at the
cost of performance and usability).

Best regards,
Mike Schuh


Re: [HACKERS] Allowing parallel pg_restore from pipe

2013-04-24 Thread Andrew Dunstan


On 04/23/2013 07:53 PM, Timothy Garnett wrote:

Hi All,

Currently the -j option to pg_restore, which allows for 
parallelization in the restore, can only be used if the input file is 
a regular file and not, for ex., a pipe.  However this is a pretty 
common occurrence for us (usually in the form of pg_dump | pg_restore 
to copy an individual database or some tables thereof from one machine 
to another).  While there's no good way to parallelize the data load 
steps when reading from a pipe, the index and constraint building can 
still be parallelized and as they are generally CPU bound on our 
machines we've found quite a bit of speedup from doing so.


Attached is two diffs off of the REL9_2_4 tag that I've been using. 
 The first is a simple change that serially loads the data section 
before handing off the remainder of the restore to the existing 
parallelized restore code (the .ALT. diff).  The second which gets 
more parallelization but is a bit more of a change uses the existing 
dependency analysis code to allow index building etc. to occur in 
parallel with data loading. The data loading tasks are still performed 
serially in the main thread, but non-data loading tasks are scheduled 
in parallel as their dependencies are satisfied (with the caveat that 
the main thread can only dispatch new tasks between data loads).


Anyways, the question is if people think this is generally useful.  If 
so I can clean up the preferred choice a bit and rebase it off of 
master, etc.



I don't think these are bad ideas at all, and probably worth doing. Note 
that there are some fairly hefty changes affecting this code in master, 
so your rebasing could be tricky.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allowing parallel pg_restore from pipe

2013-04-24 Thread Dimitri Fontaine
Andrew Dunstan and...@dunslane.net writes:
 On 04/23/2013 07:53 PM, Timothy Garnett wrote:
 Anyways, the question is if people think this is generally useful.  If so
 I can clean up the preferred choice a bit and rebase it off of master,
 etc.

I find this idea very useful yes.

Another idea would be to allow for parallel pg_dump output to somehow be
piped into a parallel pg_restore. I don't know how to solve that at all,
it just sound something worthy of doing too.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allowing parallel pg_restore from pipe

2013-04-24 Thread Andrew Dunstan


On 04/24/2013 03:40 PM, Dimitri Fontaine wrote:

Andrew Dunstan and...@dunslane.net writes:

On 04/23/2013 07:53 PM, Timothy Garnett wrote:

Anyways, the question is if people think this is generally useful.  If so
I can clean up the preferred choice a bit and rebase it off of master,
etc.

I find this idea very useful yes.

Another idea would be to allow for parallel pg_dump output to somehow be
piped into a parallel pg_restore. I don't know how to solve that at all,
it just sound something worthy of doing too.




That's not going to work, the output from parallel pg_dump is inherently 
multiple streams. That's why it ONLY supports directory format, and not 
even custom format on disk, let alone a pipe.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allowing parallel pg_restore from pipe

2013-04-24 Thread Andrew Dunstan


On 04/24/2013 03:49 PM, Andrew Dunstan wrote:


On 04/24/2013 03:40 PM, Dimitri Fontaine wrote:

Andrew Dunstan and...@dunslane.net writes:

On 04/23/2013 07:53 PM, Timothy Garnett wrote:
Anyways, the question is if people think this is generally useful.  
If so

I can clean up the preferred choice a bit and rebase it off of master,
etc.

I find this idea very useful yes.

Another idea would be to allow for parallel pg_dump output to somehow be
piped into a parallel pg_restore. I don't know how to solve that at all,
it just sound something worthy of doing too.




That's not going to work, the output from parallel pg_dump is 
inherently multiple streams. That's why it ONLY supports directory 
format, and not even custom format on disk, let alone a pipe.





What might make sense is something like pg_dump_restore which would have 
no intermediate storage at all, just pump the data etc from one source 
to another in parallel. But I pity the poor guy who has to write it :-)


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allowing parallel pg_restore from pipe

2013-04-24 Thread Stefan Kaltenbrunner
On 04/24/2013 09:51 PM, Andrew Dunstan wrote:
 
 On 04/24/2013 03:49 PM, Andrew Dunstan wrote:

 On 04/24/2013 03:40 PM, Dimitri Fontaine wrote:
 Andrew Dunstan and...@dunslane.net writes:
 On 04/23/2013 07:53 PM, Timothy Garnett wrote:
 Anyways, the question is if people think this is generally useful. 
 If so
 I can clean up the preferred choice a bit and rebase it off of master,
 etc.
 I find this idea very useful yes.

 Another idea would be to allow for parallel pg_dump output to somehow be
 piped into a parallel pg_restore. I don't know how to solve that at all,
 it just sound something worthy of doing too.



 That's not going to work, the output from parallel pg_dump is
 inherently multiple streams. That's why it ONLY supports directory
 format, and not even custom format on disk, let alone a pipe.

 
 
 What might make sense is something like pg_dump_restore which would have
 no intermediate storage at all, just pump the data etc from one source
 to another in parallel. But I pity the poor guy who has to write it :-)

hmm pretty sure that Joachims initial patch for parallel dump actually
had a PoC for something very similiar to that...


Stefan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Enabling Checksums

2013-04-24 Thread Jeff Davis
On Wed, 2013-04-24 at 08:20 +0100, Simon Riggs wrote:
 On 24 April 2013 01:10, Jeff Davis pg...@j-davis.com wrote:
  I'd prefer that it was some kind of a checksum ID code -- e.g. 0 for no
  checksum, 1 for FNV-1a-SR3, etc. That would allow us to release 9.4 with
  a new algorithm without forcing existing users to change.
 
 That's exactly what the patch does.

The word version indicates an order to it though, like N+1 is always
preferable to N. This is user-facing (through pg_controldata output),
otherwise I wouldn't mind.

  initdb would have to take the code as an option, probably in string
  form.
 
 When/if we have multiple options we can add that. The main thing was
 to make sure the control file recorded things in a common way.

The main strange thing to me is that we're still using the
enabled/disabled for the output of pg_controldata as well as the
version.

When we do have multiple options, it seems like we'd just have one field
output:

  Data page checksums: none|crc32c|pg-fnv

What goal are you trying to accomplish with this patch? pg_control
doesn't need to be compatible between releases, so can't we just add
this later when we really do have multiple options?

Regards,
Jeff Davis





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Enabling Checksums

2013-04-24 Thread Simon Riggs
On 24 April 2013 21:06, Jeff Davis pg...@j-davis.com wrote:

 What goal are you trying to accomplish with this patch?

That we might need to patch the checksum version on a production release.

--
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allowing parallel pg_restore from pipe

2013-04-24 Thread Joachim Wieland
On Wed, Apr 24, 2013 at 4:05 PM, Stefan Kaltenbrunner 
ste...@kaltenbrunner.cc wrote:

  What might make sense is something like pg_dump_restore which would have
  no intermediate storage at all, just pump the data etc from one source
  to another in parallel. But I pity the poor guy who has to write it :-)

 hmm pretty sure that Joachims initial patch for parallel dump actually
 had a PoC for something very similiar to that...


That's right, I implemented that as an own output format and named it
migrator I think, which wouldn't write each stream to a file as the
directory output format does but that instead pumps it back into a restore
client.

Actually I think the logic was even reversed, it was a parallel restore
that got the data from internally calling pg_dump functionality instead of
from reading files... The neat thing about this approach was that the order
was optimized and correct, i.e. largest tables start first and dependencies
get resolved in the right order.

I could revisit that patch for 9.4 if enough people are interested.

Joachim


Re: [HACKERS] Allowing parallel pg_restore from pipe

2013-04-24 Thread Claudio Freire
On Wed, Apr 24, 2013 at 6:47 PM, Joachim Wieland j...@mcknight.de wrote:
 On Wed, Apr 24, 2013 at 4:05 PM, Stefan Kaltenbrunner
 ste...@kaltenbrunner.cc wrote:

  What might make sense is something like pg_dump_restore which would have
  no intermediate storage at all, just pump the data etc from one source
  to another in parallel. But I pity the poor guy who has to write it :-)

 hmm pretty sure that Joachims initial patch for parallel dump actually
 had a PoC for something very similiar to that...


 That's right, I implemented that as an own output format and named it
 migrator I think, which wouldn't write each stream to a file as the
 directory output format does but that instead pumps it back into a restore
 client.

 Actually I think the logic was even reversed, it was a parallel restore that
 got the data from internally calling pg_dump functionality instead of from
 reading files... The neat thing about this approach was that the order was
 optimized and correct, i.e. largest tables start first and dependencies get
 resolved in the right order.

 I could revisit that patch for 9.4 if enough people are interested.


Indeed... I've wasted hours copying databases for test environments,
when that could've been hour (singular).


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal to add --single-row to psql

2013-04-24 Thread Christopher Manning
Fabrízio and Tom,

I know that you can use  --variable=FETCH_COUNT=1 from the
psql command line, but internally that uses a CURSOR to batch the rows and
[Redshift doesn't support CURSOR](
https://forums.aws.amazon.com/thread.jspa?threadID=122664tstart=0) so it's
not an option when using psql to download data from Redshift.

Pavel's idea of having a single row mode option to replace FETCH_COUNT is
interesting, does anyone have any problems with that or alternative ideas?

Regards,
Christopher



On Wed, Apr 24, 2013 at 7:04 AM, Fabrízio de Royes Mello 
fabriziome...@gmail.com wrote:


 On Tue, Apr 23, 2013 at 1:05 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 
  Isn't there already a way to set FETCH_COUNT from the command line?
  (ie, I think there's a generic variable-assignment facility that could
  do this)
 

 Christopher,

 Tom is all right... from psql [1] command line we can do that:

 $ bin/psql --variable=FETCH_COUNT=100
 psql (9.3devel)
 Type help for help.

 fabrizio=# \echo :FETCH_COUNT
 100

 Regards,

 [1]
 http://www.postgresql.org/docs/current/interactive/app-psql.html#AEN84903

 --
 Fabrízio de Royes Mello
 Consultoria/Coaching PostgreSQL
  Blog sobre TI: http://fabriziomello.blogspot.com
  Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
  Twitter: http://twitter.com/fabriziomello



[HACKERS] danger of stats_temp_directory = /dev/shm

2013-04-24 Thread Jeff Janes
With the stats file split patch 187492b6c2e8cafc5 introduced in 9.3dev, now
after a crash the postmaster will try to delete all files in the directory
stats_temp_directory.  When that is just a subdirectory of PGDATA, this is
fine. But it seems rather hostile when it is set to a shared directory,
like the popular /dev/shm.

Previously, it would only try to delete the one file /dev/shm/pgstat.stat,
so the danger was much smaller.

If /dev/shm is used, this only comes into play when postgres has crashed
but the OS has not (If the OS has crashed, /dev/shm it will be empty anyway
when it comes back) so perhaps this is not such a large exposure.

The docs don't discuss the issue of what happens if stats_temp_directory is
set to a non-private (to PGDATA) directory.  The docs also do not
explicitly recommend using /dev/shm, but there are plenty of examples of
that usage that come up on google (and no examples of using a private
subdirectory of /dev/shm rather than /dev/shm itself).

Does this need to be fixed, or at least documented?

Cheers,

Jeff


Re: [HACKERS] Failing start-up archive recovery at Standby mode in PG9.2.4

2013-04-24 Thread Kyotaro HORIGUCHI
I had a bit look on it and came up with an hypothesis.. umm or a
scenario.

==

 Just after restartpoint, too old xlog files are recycled but its
page header has old content, specifically, xlogid and xrecoff.

Plus, if the master's LSN is at the head of new segment file, the
file for the segment may have not been created and the WAL send
request for that LSN from the standby might fail as requested
WAL segment %s has already been removed, in spite of the segment
is not yet created...

 So the standby received nack for the request and looks for
archive but the file is properly not existent. But the file of
that segment is certainly found in pg_xlog directory. So
XLogFileRead grabs the old-in-content file and...

 seems showing that the standby received negative ack for the request
 for 20th segment, and 5 seconds later, it tried to get that from
 archive and somehow thought that it'd gotten but the header is of 6th
 segment.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] danger of stats_temp_directory = /dev/shm

2013-04-24 Thread Alvaro Herrera
Jeff Janes escribió:
 With the stats file split patch 187492b6c2e8cafc5 introduced in 9.3dev, now
 after a crash the postmaster will try to delete all files in the directory
 stats_temp_directory.  When that is just a subdirectory of PGDATA, this is
 fine. But it seems rather hostile when it is set to a shared directory,
 like the popular /dev/shm.

 Does this need to be fixed, or at least documented?

I think we need it fixed so that it only deletes the files matching a
well-known pattern.  In fact, there's an XXX comment about this in the
code:

/* XXX should we try to ignore files other than the ones we write? */

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Failing start-up archive recovery at Standby mode in PG9.2.4

2013-04-24 Thread KONDO Mitsumasa

Hi,


I find problem about failing start-up achive recovery at Standby mode in 
PG9.2.4 streaming replication.
I test same problem in PG9.2.3. But it is not occerd...



cp: cannot stat `../arc/00030013': そのようなファイルやディレクトリはありません
[Standby] 2013-04-22 01:27:25 EDTLOG:  0: restored log file 
00020013 from archive


I can't read the error message here, but this looks suspicious. The
recovery command seems to be returning success although it prints such
an error message?

Sorry.. This suspicious massage is written in japanese.
It means missing file or directory. Beacause starting point of timeline 3 WAL 
file is 0003002A.
 

cp: cannot stat `../arc/00030014': そのようなファイルやディレクトリはありません

[* This point is illegal work] [Standby] 2013-04-22 01:27:25 EDTLOG:  0: 
unexpected pageaddr 0/600 in log file 0, segment 20, offset 0

[Standby] 2013-04-22 01:27:25 EDTLOCATION:  ValidXLOGHeader, xlog.c:4395
cp: cannot stat `../arc/00030014': そのようなファイルやディレクトリはありません
cp: cannot stat `../arc/0004.history': そのようなファイルやディレクトリはありません
[Master] 2013-04-22 01:27:25 EDTDEBUG:  0: received replication command: 
IDENTIFY_SYSTEM
[Master] 2013-04-22 01:27:25 EDTLOCATION:  HandleReplicationCommand, 
walsender.c:449
[Master] 2013-04-22 01:27:25 EDTDEBUG:  0: received replication command: 
START_REPLICATION 0/1400
[Master] 2013-04-22 01:27:25 EDTLOCATION:  HandleReplicationCommand, 
walsender.c:449
[Master] 2013-04-22 01:27:25 EDTFATAL:  58P01: requested WAL segment 
00030014 has already been removed
[Master] 2013-04-22 01:27:25 EDTLOCATION:  XLogRead, walsender.c:1023
[Standby] 2013-04-22 01:27:25 EDTLOG:  0: streaming replication 
successfully connected to primary
[Standby] 2013-04-22 01:27:25 EDTLOCATION:  libpqrcv_connect, 
libpqwalreceiver.c:171
[Standby] 2013-04-22 01:27:25 EDTFATAL:  XX000: could not receive data from WAL 
stream: FATAL:  requested WAL segment 00030014 has already been 
removed

[Standby] 2013-04-22 01:27:25 EDTLOCATION:  libpqrcv_receive, 
libpqwalreceiver.c:389

[mitsu-ko@localhost postgresql-9.2.4]$ bincp: cannot stat 
`../arc/00030014': そのようなファイルやディレクトリはありません
[Standby] 2013-04-22 01:27:30 EDTDEBUG:  0: unexpected pageaddr 0/600 
in log file 0, segment 20, offset 0
[Standby] 2013-04-22 01:27:30 EDTLOCATION:  ValidXLOGHeader, xlog.c:4395
cp: cannot stat `../arc/00030014': そのようなファイルやディレクトリはありません
cp: cannot stat `../arc/0004.history': そのようなファイルやディレクトリはありません
[Master] 2013-04-22 01:27:30 EDTDEBUG:  0: received replication command: 
IDENTIFY_SYSTEM
[Master] 2013-04-22 01:27:30 EDTLOCATION:  HandleReplicationCommand, 
walsender.c:449
[Master] 2013-04-22 01:27:30 EDTDEBUG:  0: received replication command: 
START_REPLICATION 0/1400
[Master] 2013-04-22 01:27:30 EDTLOCATION:  HandleReplicationCommand, 
walsender.c:449
[Master] 2013-04-22 01:27:30 EDTFATAL:  58P01: requested WAL segment 
00030014 has already been removed
[Master] 2013-04-22 01:27:30 EDTLOCATION:  XLogRead, walsender.c:1023
[Standby] 2013-04-22 01:27:30 EDTLOG:  0: streaming replication 
successfully connected to primary
[Standby] 2013-04-22 01:27:30 EDTLOCATION:  libpqrcv_connect, 
libpqwalreceiver.c:171
[Standby] 2013-04-22 01:27:30 EDTFATAL:  XX000: could not receive data from WAL 
stream: FATAL:  requested WAL segment 00030014 has already been 
removed


And this seems to be the second problem. You probably need to configure
a higher wal_keep_segments on the primary or you need to fix your
recovery_command.

In this situation, I try to up timelineID by archive recovery, not also recover 
WAL files.
If achive recovery works correctly in Standby, recovery done point is 
00020029 in timelineID 2.
Please see detail in pg9.2.4.log. Situation of archive file is here.
 

We try to fix this problem with my company's colleaguea,
but it is too difficult and complicated to WAL contorol and validation method...

This problem is occerd in xlog.c:10692.

10692 if (!ValidXLOGHeader((XLogPageHeader) readBuf, emode, false))
10693 goto next_record_is_invalid;


And problem is occerd this checking in ValidXLOGHeader().

  4390 if (!XLByteEQ(hdr-xlp_pageaddr, recaddr))
  4391 {
  4392 ereport(emode_for_corrupt_record(emode, recaddr),
  4393 (errmsg(unexpected pageaddr %X/%X in log 
file %u, segment %u, offset %u,
  4394 hdr-xlp_pageaddr.xlogid, 
hdr-xlp_pageaddr.xrecoff,
  4395 readId, readSeg, 
readOff)));
  4396 return false;
  4397 }


We think hdr-xlp_pageaddr.xrecoff has wrong value. This is very rare cace!
We cannot analyze why this value is uncorrect!


I think the tests is just noticing that we don't have valid WAL - we are

Re: [HACKERS] Enabling Checksums

2013-04-24 Thread Jeff Davis
On Wed, 2013-04-24 at 21:09 +0100, Simon Riggs wrote:
 On 24 April 2013 21:06, Jeff Davis pg...@j-davis.com wrote:
 
  What goal are you trying to accomplish with this patch?
 
 That we might need to patch the checksum version on a production release.

Oh, I see.

I don't think we need two output fields from pg_controldata though. It's
a little redundant, and confused me when I was looking at the impact on
pg_upgrade. And it means nothing to the user until we actually have
multiple algorithms available, at which time we are better off with a
text representation.

Other than that, I think your patch is fine to accomplish the
aforementioned goal. Essentially, it just changes the bool to a uint32,
which I favor.

Regards,
Jeff Davis



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] danger of stats_temp_directory = /dev/shm

2013-04-24 Thread Tom Lane
Alvaro Herrera alvhe...@2ndquadrant.com writes:
 Jeff Janes escribió:
 With the stats file split patch 187492b6c2e8cafc5 introduced in 9.3dev, now
 after a crash the postmaster will try to delete all files in the directory
 stats_temp_directory.  When that is just a subdirectory of PGDATA, this is
 fine. But it seems rather hostile when it is set to a shared directory,
 like the popular /dev/shm.

 Does this need to be fixed, or at least documented?

 I think we need it fixed so that it only deletes the files matching a
 well-known pattern.

I think we need it fixed to reject any stats_temp_directory that is not
postgres-owned with restrictive permissions.  The problem here is not
with what it deletes, it's with the insanely insecure configuration.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Please add discussion topics for cluster-hackers meeting

2013-04-24 Thread Josh Berkus
Folks,

The 2013 cluster-hackers meeting is less than a month away.  If you are
attending, and have topics of common interest to discuss, such as core
APIs (i.e. event triggers are in, at least partway), please add your
topic to the wiki here:

https://wiki.postgresql.org/wiki/PgCon2013CanadaClusterSummit#10:30AM_to_11:30AM

... and make sure you're on the list of attendees, so that you get lunch!

Also, note that a 3-hour PostgresXC summit will follow the general
cluster-hackers meeting.  Anyone interested in working on XC is
encouraged to attend.

--Josh Berkus


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers