Re: [HACKERS] Enabling Checksums

2013-04-24 Thread Simon Riggs
On 24 April 2013 01:10, Jeff Davis  wrote:
> On Tue, 2013-04-23 at 16:28 +0100, Simon Riggs wrote:
>> * make the pg_control.data_checksums field into a version number, for
>> future flexibility...
>> patch attached
>
> Commenting on this separately because it's a separate issue.
>
> I'd prefer that it was some kind of a checksum ID code -- e.g. 0 for no
> checksum, 1 for FNV-1a-SR3, etc. That would allow us to release 9.4 with
> a new algorithm without forcing existing users to change.

That's exactly what the patch does.

> initdb would have to take the code as an option, probably in string
> form.

When/if we have multiple options we can add that. The main thing was
to make sure the control file recorded things in a common way.

--
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fast promotion, loose ends

2013-04-24 Thread Heikki Linnakangas

On 22.04.2013 18:44, Simon Riggs wrote:

On 22 April 2013 09:29, Heikki Linnakangas  wrote:


Hmm. That requires write access to $DATADIR, so that's not quite the same
thing as the trigger_file recovery.conf option.


Well, you also (elsewhere) requested that we must keep recovery.conf
in $DATADIR, so it needs to be writable.


That's a slightly different requirement. $DATADIR must be writable by 
the process that restores the backup or puts the server into standby 
mode, while trigger_file needs to be writable by the process that 
triggers failover. Those are not necessarily the same thing. I'm 
thinking of a heartbeat process that triggers failover by creating a 
file on an NFS server or similar. Possibly the same location where the 
WAL archive is located. $DATADIR would be stored elsewhere.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fast promotion, loose ends

2013-04-24 Thread Simon Riggs
On 24 April 2013 08:23, Heikki Linnakangas  wrote:
> On 22.04.2013 18:44, Simon Riggs wrote:
>>
>> On 22 April 2013 09:29, Heikki Linnakangas
>> wrote:
>>
>>> Hmm. That requires write access to $DATADIR, so that's not quite the same
>>> thing as the trigger_file recovery.conf option.
>>
>>
>> Well, you also (elsewhere) requested that we must keep recovery.conf
>> in $DATADIR, so it needs to be writable.
>
>
> That's a slightly different requirement. $DATADIR must be writable by the
> process that restores the backup or puts the server into standby mode, while
> trigger_file needs to be writable by the process that triggers failover.
> Those are not necessarily the same thing. I'm thinking of a heartbeat
> process that triggers failover by creating a file on an NFS server or
> similar. Possibly the same location where the WAL archive is located.
> $DATADIR would be stored elsewhere.

The default you've requested is fast promotion and I've agreed to that.

The ability to write a file called "promote" to $DATADIR is there as a
protection in case we need it in the field, its not going to be the
primary mechanism any more. So if you're not intending to use it ever,
it doesn't seem worth discussing the fact you don't like its location.

But if you do want to discuss it, I think it's unreasonable of you to
demand recovery.conf cannot be outside $DATADIR and then also demand
related files be relocatable outside $DATADIR as well, the exact
opposite. We had the chance to avoid making $DATADIR writable
externally and that's gone now, at least for now.

Here's the patch I was intending to apply. Please let me know if you
have comments.

--
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


set_fast_promotion_as_default.v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fast promotion, loose ends

2013-04-24 Thread Heikki Linnakangas

On 24.04.2013 10:57, Simon Riggs wrote:

On 24 April 2013 08:23, Heikki Linnakangas  wrote:

On 22.04.2013 18:44, Simon Riggs wrote:


On 22 April 2013 09:29, Heikki Linnakangas
wrote:


Hmm. That requires write access to $DATADIR, so that's not quite the same
thing as the trigger_file recovery.conf option.


Well, you also (elsewhere) requested that we must keep recovery.conf
in $DATADIR, so it needs to be writable.


That's a slightly different requirement. $DATADIR must be writable by the
process that restores the backup or puts the server into standby mode, while
trigger_file needs to be writable by the process that triggers failover.
Those are not necessarily the same thing. I'm thinking of a heartbeat
process that triggers failover by creating a file on an NFS server or
similar. Possibly the same location where the WAL archive is located.
$DATADIR would be stored elsewhere.


The default you've requested is fast promotion and I've agreed to that.

The ability to write a file called "promote" to $DATADIR is there as a
protection in case we need it in the field, its not going to be the
primary mechanism any more. So if you're not intending to use it ever,
it doesn't seem worth discussing the fact you don't like its location.


Ok, works for me.


But if you do want to discuss it, I think it's unreasonable of you to
demand recovery.conf cannot be outside $DATADIR and then also demand
related files be relocatable outside $DATADIR as well, the exact
opposite. We had the chance to avoid making $DATADIR writable
externally and that's gone now, at least for now.


As I said above, it's a different situation. recovery.conf has always 
been in $DATADIR, and it's always been possible to point trigger_file 
elsewhere, and you've always gotten full functionality using the 
trigger_file. I just want to maintain that status quo. Which your patch 
achieves, so I'm happy that that.



Here's the patch I was intending to apply. Please let me know if you
have comments.


Regarding the change in pg_ctl:


/*
-* Use two different kinds of promotion file so we can understand
-* the difference between smart and fast promotion.
+* For 9.3 onwards, use fast promotion as the default option.
+* Promotion with a full checkpoint is still possible by writing
+* a file called "promote", e.g.
+*   snprintf(promote_file, MAXPGPATH, "%s/promote", pg_data);
 */
-   if (shutdown_mode >= FAST_MODE)
-   snprintf(promote_file, MAXPGPATH, "%s/fast_promote", pg_data);
-   else
-   snprintf(promote_file, MAXPGPATH, "%s/promote", pg_data);
+   snprintf(promote_file, MAXPGPATH, "%s/fast_promote", pg_data);


Should there be a version check there? I guess we've never guaranteed a 
newer pg_ctl to work with an older server version, but it seems likely 
that someone would try to do that, especially with "pg_ctl promote". 
With the above change, creating $DATADIR/fast_promote in a 9.2 server's 
data dir will do nothing. I'd suggest that we keep the filename 
unchanged, "promote", and only change the behavior in the server side, 
so that it performs fast promotion. If you want to have a "slow" promote 
file, we can call that "slow_promote" or "checkpoint_then_promote" or 
something.


Aside from that, I assume you'll clean up the now-dead -m fast option etc.

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fast promotion, loose ends

2013-04-24 Thread Simon Riggs
On 24 April 2013 09:10, Heikki Linnakangas  wrote:
> Regarding the change in pg_ctl:
>
>> /*
>> -* Use two different kinds of promotion file so we can understand
>> -* the difference between smart and fast promotion.
>> +* For 9.3 onwards, use fast promotion as the default option.
>> +* Promotion with a full checkpoint is still possible by writing
>> +* a file called "promote", e.g.
>> +*   snprintf(promote_file, MAXPGPATH, "%s/promote", pg_data);
>>  */
>> -   if (shutdown_mode >= FAST_MODE)
>> -   snprintf(promote_file, MAXPGPATH, "%s/fast_promote",
>> pg_data);
>> -   else
>> -   snprintf(promote_file, MAXPGPATH, "%s/promote", pg_data);
>> +   snprintf(promote_file, MAXPGPATH, "%s/fast_promote", pg_data);
>
>
> Should there be a version check there? I guess we've never guaranteed a
> newer pg_ctl to work with an older server version, but it seems likely that
> someone would try to do that, especially with "pg_ctl promote". With the
> above change, creating $DATADIR/fast_promote in a 9.2 server's data dir will
> do nothing. I'd suggest that we keep the filename unchanged, "promote", and
> only change the behavior in the server side, so that it performs fast
> promotion. If you want to have a "slow" promote file, we can call that
> "slow_promote" or "checkpoint_then_promote" or something.

pg_ctl already checks versions, so I don't see the point.

--
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fast promotion, loose ends

2013-04-24 Thread Heikki Linnakangas

On 24.04.2013 11:23, Simon Riggs wrote:

On 24 April 2013 09:10, Heikki Linnakangas  wrote:

Regarding the change in pg_ctl:


 /*
-* Use two different kinds of promotion file so we can understand
-* the difference between smart and fast promotion.
+* For 9.3 onwards, use fast promotion as the default option.
+* Promotion with a full checkpoint is still possible by writing
+* a file called "promote", e.g.
+*   snprintf(promote_file, MAXPGPATH, "%s/promote", pg_data);
  */
-   if (shutdown_mode>= FAST_MODE)
-   snprintf(promote_file, MAXPGPATH, "%s/fast_promote",
pg_data);
-   else
-   snprintf(promote_file, MAXPGPATH, "%s/promote", pg_data);
+   snprintf(promote_file, MAXPGPATH, "%s/fast_promote", pg_data);



Should there be a version check there? I guess we've never guaranteed a
newer pg_ctl to work with an older server version, but it seems likely that
someone would try to do that, especially with "pg_ctl promote". With the
above change, creating $DATADIR/fast_promote in a 9.2 server's data dir will
do nothing. I'd suggest that we keep the filename unchanged, "promote", and
only change the behavior in the server side, so that it performs fast
promotion. If you want to have a "slow" promote file, we can call that
"slow_promote" or "checkpoint_then_promote" or something.


pg_ctl already checks versions, so I don't see the point.


The point is, if you do "pgsql93/bin/pg_ctl -D $92DATADIR promote", it 
will create fast_promote file and return success. But it won't actually 
promote the server. I think that's bad.


If pg_ctl already has a check against that, fine, but I don't think it 
does. Please make sure you test that before applying.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fast promotion, loose ends

2013-04-24 Thread Simon Riggs
On 24 April 2013 09:32, Heikki Linnakangas  wrote:

>> pg_ctl already checks versions, so I don't see the point.
>
>
> The point is, if you do "pgsql93/bin/pg_ctl -D $92DATADIR promote", it will
> create fast_promote file and return success. But it won't actually promote
> the server. I think that's bad.
>
> If pg_ctl already has a check against that, fine, but I don't think it does.
> Please make sure you test that before applying.

If it doesn't check, that is not the only thing that would be broken.
The original commit of pg_ctl promote would also be broken.

--
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fast promotion, loose ends

2013-04-24 Thread Heikki Linnakangas

On 24.04.2013 11:46, Simon Riggs wrote:

On 24 April 2013 09:32, Heikki Linnakangas  wrote:


pg_ctl already checks versions, so I don't see the point.



The point is, if you do "pgsql93/bin/pg_ctl -D $92DATADIR promote", it will
create fast_promote file and return success. But it won't actually promote
the server. I think that's bad.

If pg_ctl already has a check against that, fine, but I don't think it does.
Please make sure you test that before applying.


If it doesn't check, that is not the only thing that would be broken.
The original commit of pg_ctl promote would also be broken.


Yeah, it would've been good if the "pg_ctl promote" patch would've added 
a version check. Nevertheless, don't you think it would be good to avoid 
changing the filename of the "promote" file, so that we don't have any 
more such breakage? I don't see any advantage in changing it.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Failing start-up archive recovery at Standby mode in PG9.2.4

2013-04-24 Thread Andres Freund
ello,

On 2013-04-24 17:43:39 +0900, KONDO Mitsumasa wrote:
> Hi,
> 
> I find problem about failing start-up achive recovery at Standby mode in 
> PG9.2.4 streaming replication.
> I test same problem in PG9.2.3. But it is not occerd...

> > cp: cannot stat `../arc/00030013': そのようなファイルやディレクトリはありません
> > [Standby] 2013-04-22 01:27:25 EDTLOG:  0: restored log file 
> > "00020013" from archive

I can't read the error message here, but this looks suspicious. The
recovery command seems to be returning success although it prints such
an error message?

> > cp: cannot stat `../arc/00030014': そのようなファイルやディレクトリはありません
> [* This point is illegal work]> [Standby] 2013-04-22 01:27:25 EDTLOG:  0: 
> unexpected pageaddr 0/600 in log file 0, segment 20, offset 0
> > [Standby] 2013-04-22 01:27:25 EDTLOCATION:  ValidXLOGHeader, xlog.c:4395
> > cp: cannot stat `../arc/00030014': そのようなファイルやディレクトリはありません
> > cp: cannot stat `../arc/0004.history': そのようなファイルやディレクトリはありません
> > [Master] 2013-04-22 01:27:25 EDTDEBUG:  0: received replication 
> > command: IDENTIFY_SYSTEM
> > [Master] 2013-04-22 01:27:25 EDTLOCATION:  HandleReplicationCommand, 
> > walsender.c:449
> > [Master] 2013-04-22 01:27:25 EDTDEBUG:  0: received replication 
> > command: START_REPLICATION 0/1400
> > [Master] 2013-04-22 01:27:25 EDTLOCATION:  HandleReplicationCommand, 
> > walsender.c:449
> > [Master] 2013-04-22 01:27:25 EDTFATAL:  58P01: requested WAL segment 
> > 00030014 has already been removed
> > [Master] 2013-04-22 01:27:25 EDTLOCATION:  XLogRead, walsender.c:1023
> > [Standby] 2013-04-22 01:27:25 EDTLOG:  0: streaming replication 
> > successfully connected to primary
> > [Standby] 2013-04-22 01:27:25 EDTLOCATION:  libpqrcv_connect, 
> > libpqwalreceiver.c:171
> > [Standby] 2013-04-22 01:27:25 EDTFATAL:  XX000: could not receive data from 
> > WAL stream: FATAL:  requested WAL segment 00030014 has 
> > already been removed
> > 
> > [Standby] 2013-04-22 01:27:25 EDTLOCATION:  libpqrcv_receive, 
> > libpqwalreceiver.c:389
> > 
> > [mitsu-ko@localhost postgresql-9.2.4]$ bincp: cannot stat 
> > `../arc/00030014': そのようなファイルやディレクトリはありません
> > [Standby] 2013-04-22 01:27:30 EDTDEBUG:  0: unexpected pageaddr 
> > 0/600 in log file 0, segment 20, offset 0
> > [Standby] 2013-04-22 01:27:30 EDTLOCATION:  ValidXLOGHeader, xlog.c:4395
> > cp: cannot stat `../arc/00030014': そのようなファイルやディレクトリはありません
> > cp: cannot stat `../arc/0004.history': そのようなファイルやディレクトリはありません
> > [Master] 2013-04-22 01:27:30 EDTDEBUG:  0: received replication 
> > command: IDENTIFY_SYSTEM
> > [Master] 2013-04-22 01:27:30 EDTLOCATION:  HandleReplicationCommand, 
> > walsender.c:449
> > [Master] 2013-04-22 01:27:30 EDTDEBUG:  0: received replication 
> > command: START_REPLICATION 0/1400
> > [Master] 2013-04-22 01:27:30 EDTLOCATION:  HandleReplicationCommand, 
> > walsender.c:449
> > [Master] 2013-04-22 01:27:30 EDTFATAL:  58P01: requested WAL segment 
> > 00030014 has already been removed
> > [Master] 2013-04-22 01:27:30 EDTLOCATION:  XLogRead, walsender.c:1023
> > [Standby] 2013-04-22 01:27:30 EDTLOG:  0: streaming replication 
> > successfully connected to primary
> > [Standby] 2013-04-22 01:27:30 EDTLOCATION:  libpqrcv_connect, 
> > libpqwalreceiver.c:171
> > [Standby] 2013-04-22 01:27:30 EDTFATAL:  XX000: could not receive data from 
> > WAL stream: FATAL:  requested WAL segment 00030014 has 
> > already been removed

And this seems to be the second problem. You probably need to configure
a higher wal_keep_segments on the primary or you need to fix your
recovery_command.


> We try to fix this problem with my company's colleaguea,
> but it is too difficult and complicated to WAL contorol and validation 
> method...
> 
> This problem is occerd in xlog.c:10692.
> > 10692 if (!ValidXLOGHeader((XLogPageHeader) readBuf, emode, false))
> > 10693 goto next_record_is_invalid;
> 
> And problem is occerd this checking in ValidXLOGHeader().
> >  4390 if (!XLByteEQ(hdr->xlp_pageaddr, recaddr))
> >  4391 {
> >  4392 ereport(emode_for_corrupt_record(emode, recaddr),
> >  4393 (errmsg("unexpected pageaddr %X/%X in 
> > log file %u, segment %u, offset %u",
> >  4394 
> > hdr->xlp_pageaddr.xlogid, hdr->xlp_pageaddr.xrecoff,
> >  4395 readId, readSeg, 
> > readOff)));
> >  4396 return false;
> >  4397 }
> 
> We think hdr->xlp_pageaddr.xrecoff has wrong value. This is very rare cace!
> We cannot analyze why this value is uncorrect!

I think the tests is just noticing that we don't have valid WAL - we are
probably reading old preallocated wal files from before starting the
node as a standby. Norma

Re: [HACKERS] Fast promotion, loose ends

2013-04-24 Thread Simon Riggs
On 24 April 2013 09:53, Heikki Linnakangas  wrote:
> On 24.04.2013 11:46, Simon Riggs wrote:
>>
>> On 24 April 2013 09:32, Heikki Linnakangas
>> wrote:
>>
 pg_ctl already checks versions, so I don't see the point.
>>>
>>>
>>>
>>> The point is, if you do "pgsql93/bin/pg_ctl -D $92DATADIR promote", it
>>> will
>>> create fast_promote file and return success. But it won't actually
>>> promote
>>> the server. I think that's bad.
>>>
>>> If pg_ctl already has a check against that, fine, but I don't think it
>>> does.
>>> Please make sure you test that before applying.
>>
>>
>> If it doesn't check, that is not the only thing that would be broken.
>> The original commit of pg_ctl promote would also be broken.
>
>
> Yeah, it would've been good if the "pg_ctl promote" patch would've added a
> version check. Nevertheless, don't you think it would be good to avoid
> changing the filename of the "promote" file, so that we don't have any more
> such breakage? I don't see any advantage in changing it.

Apart from all the reasons I already gave, no.

The filename isn't changing. We are adding a new capability and
changing the default.

--
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Failing start-up archive recovery at Standby mode in PG9.2.4

2013-04-24 Thread Kyotaro HORIGUCHI
Hello,

> > cp: cannot stat `../arc/00030013': そのようなファイルやディレクトリはありません
> > [Standby] 2013-04-22 01:27:25 EDTLOG:  0: restored log file 
> > "00020013" from archive

> I can't read the error message here, but this looks suspicious.

The error message is "No such file or directory" in Japanese :-).
Perhaps restore_command is 'cp ../arc/%f %p'.

> The recovery command seems to be returning success although it prints such
an error message?

It seems that, on 9.2.4, RestoreArchivedFile returns expected xlog
filename even if the restore command ends up in failure. And the
caller XLogFileOpen successfully ets and returns fd for the filename
if the file exists since before the restoration attempt. It does not
matter when the existing file and the file expected to be restored are
identical in content.

If the segment includes the timeline increment point, something might
happen, I suppose..

regards,

-- 
Kyotaro Horiguchi


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Failing start-up archive recovery at Standby mode in PG9.2.4

2013-04-24 Thread Kyotaro HORIGUCHI
Sorry,

> caller XLogFileOpen successfully ets and returns fd for the filename

The caller is XLogFileRead in this case.

# and 'ets' is gets, of course.

regards,
-- 
Kyotaro Horiguchi


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Failing start-up archive recovery at Standby mode in PG9.2.4

2013-04-24 Thread Andres Freund
On 2013-04-24 19:16:12 +0900, Kyotaro HORIGUCHI wrote:
> Hello,
> 
> > > cp: cannot stat `../arc/00030013': そのようなファイルやディレクトリはありません
> > > [Standby] 2013-04-22 01:27:25 EDTLOG:  0: restored log file 
> > > "00020013" from archive
> 
> > I can't read the error message here, but this looks suspicious.
> 
> The error message is "No such file or directory" in Japanese :-).
> Perhaps restore_command is 'cp ../arc/%f %p'.
> 
> > The recovery command seems to be returning success although it prints such
> an error message?
> 
> It seems that, on 9.2.4, RestoreArchivedFile returns expected xlog
> filename even if the restore command ends up in failure. And the
> caller XLogFileOpen successfully ets and returns fd for the filename
> if the file exists since before the restoration attempt. It does not
> matter when the existing file and the file expected to be restored are
> identical in content.

But thats not what happening here, afaics the "restore log file ..."
message is only printed if the returncode is 0.

And the return value from RestoreArchivedFile() would be different in
the case you describe which would make XLogFileRead() fail, so it should
result in different errors.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Failing start-up archive recovery at Standby mode in PG9.2.4

2013-04-24 Thread Kyotaro HORIGUCHI
Oops,

> But thats not what happening here, afaics the "restore log file ..."
> message is only printed if the returncode is 0.

You're right. 'cp  ' exits with the status
code 1 (or 256?).

The quoted log lines simply show that segment for tli=3 did not exist
and that for tli=2 had been gotten finally. It's quite normal and
irrelevant to the trouble mentioned. Sorry for the confusion.

Unfortunately, the script attached didn't reproduce the situation for
me. But in the log attached,

> [Standby] 2013-04-22 01:27:25 EDTFATAL:  XX000: could not receive data from 
> WAL stream: FATAL:  requested WAL segment 00030014 has 
> already been removed
> cp: cannot stat `../arc/00030014': No such file or directory.
> [Standby] 2013-04-22 01:27:30 EDTDEBUG:  0: unexpected pageaddr 0/600 
> in log file 0, segment 20, offset 0

seems showing that the standby received negative ack for the request
for 20th segment, and 5 seconds later, it tried to get that from
archive and somehow thought that it'd gotten but the header is of 6th
segment.

regards,

-- 
Kyotaro Horiguchi


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSOC Student Project Idea

2013-04-24 Thread Florian Pflug
On Apr23, 2013, at 23:25 , Alexander Korotkov  wrote:
> I've taken a brief look on the paper and implementation. As I can see 
> iDistance implements some global building strategy. I mean, for example, it 
> selects some point, calculates distances from selected point to all points in 
> dataset etc. So, it uses the whole dataset at the same time. 
> 
> However you can try to implement global index building in GiST or SP-GiST. In 
> this case I think you should carefully estimate your capabilities during 
> single GSoC project. You would need to extend GiST or SP-GiST interface and 
> write completely new implementation of tree building algorithm. Question of 
> how to exactly extend GiST or SP-GiST interface for this case could appear to 
> be very hard even theoretically.

+1. That seemed to be a major roadblock to me too when I read the paper.

You could work around that by making partition identification a separate step. 
You'd have a function

  idist_analyze(cfg name, table name, field name)

which'd identify suitable partitions for the data distribution in table.field 
and store them somewhere. Such a set of pre-identified partitions would be akin 
to a tsearch configuration, i.e. all other parts of the iDistance machinery 
would use it to map points to index keys and queries to ranges of those keys. 
You'll want to look at how tsearch handles that, and check if the method can 
indeed be applied to iDistance.

In a first cut, you'd probably only allow inserts into index which don't change 
the maximum distances from the partition centers that idist_analyze() found. 
With that restriction in place, you might get away without GiST or SP-GiSt, and 
simply use postgres' standard btree, if you find a way to map queries of the 
form "field  idistance_knn_function(point, distance)" 
to "where (idstance_keys(field) between P1_lowerbound AND P2_upperbound) OR 
(idistance_keys(field) between P2_lowerbuild AND P2_upperbound) …" or something 
equivalent. Or you could start with a GiST-based btree and extend it to support 
"distance" function. Looking at contrib/btree_gist and the built-in GiST 
operator class for point types should help.

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal to add --single-row to psql

2013-04-24 Thread Fabrízio de Royes Mello
On Tue, Apr 23, 2013 at 1:05 PM, Tom Lane  wrote:
>
> Isn't there already a way to set FETCH_COUNT from the command line?
> (ie, I think there's a generic variable-assignment facility that could
> do this)
>

Christopher,

Tom is all right... from psql [1] command line we can do that:

$ bin/psql --variable=FETCH_COUNT=100
psql (9.3devel)
Type "help" for help.

fabrizio=# \echo :FETCH_COUNT
100

Regards,

[1]
http://www.postgresql.org/docs/current/interactive/app-psql.html#AEN84903

-- 
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Blog sobre TI: http://fabriziomello.blogspot.com
>> Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello


Re: [HACKERS] REFRESH MATERIALIZED VIEW command in PL block hitting Assert

2013-04-24 Thread Jeevan Chalke
On Wed, Apr 24, 2013 at 3:04 AM, Kevin Grittner  wrote:

> Jeevan Chalke  wrote:
> > On Mon, Apr 22, 2013 at 6:41 PM, Andres Freund 
> wrote:
> >> On 2013-04-22 18:35:04 +0530, Jeevan Chalke wrote:
>
> >>> I have observed that following sequence is causing server crash.
> >>>
> >>> CREATE MATERIALIZED VIEW temp_class_mv AS
> >>>   SELECT * FROM pg_class
> >>>   WITH NO DATA;
> >>>
> >>> CREATE OR REPLACE FUNCTION test_refresh_mv()
> >>> RETURNS int
> >>> AS $$
> >>> BEGIN
> >>>  REFRESH MATERIALIZED VIEW temp_class_mv;
> >>>  return 1;
> >>> END; $$ LANGUAGE plpgsql;
> >>>
> >>> SELECT test_refresh_mv();
> >>>
> >>>
> >>> I had a quick look over the crash and it is hitting following Assert in
> >>> spi.c:
> >>>
> >>> else if (IsA(stmt, RefreshMatViewStmt))
> >>> {
> >>> Assert(strncmp(completionTag,
> >>>"REFRESH MATERIALIZED VIEW ", 23)
> == 0);
> >>> _SPI_current->processed = strtoul(completionTag +
> 23,
> >>>   NULL, 10);
> >>> }
> >>>
> >>> It seems like we are missing expected value for completionTag in
> >>> ExecRefreshMatView()
>
> >> Possibly independent from this issue, but where did that 23 come from?
>
> When the consensus developed to change the syntax from LOAD
> MATERIALIZED VIEW I failed to noticed the length here when making
> the changes for that.
>
> > BTW, attached is the patch which works well for me, but need details
> review.
>
> I suggest that we just rip out this section of code.  Trying to
> provide a number here is probably all wrong, anyway.  As the
> features evolve, there may not be a readily accessible rowcount for
> this command in all cases.
>

OK.


>
> Any objections to the attached to fix this issue?
>

Nope. Fine with me.

Thanks

>
> --
> Kevin Grittner
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>



-- 
Jeevan B Chalke
Senior Software Engineer, R&D
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Phone: +91 20 30589500

Website: www.enterprisedb.com
EnterpriseDB Blog: http://blogs.enterprisedb.com/
Follow us on Twitter: http://www.twitter.com/enterprisedb

This e-mail message (and any attachment) is intended for the use of the
individual or entity to whom it is addressed. This message contains
information from EnterpriseDB Corporation that may be privileged,
confidential, or exempt from disclosure under applicable law. If you are
not the intended recipient or authorized to receive this for the intended
recipient, any use, dissemination, distribution, retention, archiving, or
copying of this communication is strictly prohibited. If you have received
this e-mail in error, please notify the sender immediately by reply e-mail
and delete this message.


Re: [HACKERS] high io BUT huge amount of free memory

2013-04-24 Thread Robert Haas
On Tue, Apr 23, 2013 at 10:50 AM, Shaun Thomas  wrote:
> This is most likely a NUMA issue. There really seems to be some kind of
> horrible flaw in the Linux kernel when it comes to properly handling NUMA on
> large memory systems.

Are you referring to the fact that vm.zone_reclaim_mode = 1 is an
idiotic default?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] missing time.h include in psql/command.c since the addition of \watch

2013-04-24 Thread Andres Freund
Hi,

our internal testbuilds show a new warning on windows:
src\bin\psql\command.c(2617): warning C4013: 'time' undefined; assuming extern 
returning int 
[C:\jenkins\workspace\andres_git.postgresql.org_windows\BT\release\SL_OS\windows\TA\x86\TO\xp\psql.vcxproj]
src\bin\psql\command.c(2619): warning C4013: 'asctime' undefined; assuming 
extern returning int 
[C:\jenkins\workspace\andres_git.postgresql.org_windows\BT\release\SL_OS\windows\TA\x86\TO\xp\psql.vcxproj]
src\bin\psql\command.c(2619): warning C4013: 'localtime' undefined; assuming 
extern returning int 
[C:\jenkins\workspace\andres_git.postgresql.org_windows\BT\release\SL_OS\windows\TA\x86\TO\xp\psql.vcxproj]

Looking at command.c its neither surprising that we get that warning,
nor that we don't get it on non-windows builds. Those get time.h via
portability/instr_time.h.

So it seems we should add a time.h include to command.c. Trivial patch
for that attached.

Not sure why that warning isn't visible on the buildfarm though...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index 09939fd..2e74284 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -13,6 +13,8 @@
 #endif
 
 #include 
+#include 
+
 #ifdef HAVE_PWD_H
 #include 
 #endif

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.3 release notes suggestions

2013-04-24 Thread Robert Haas
On Tue, Apr 23, 2013 at 11:41 PM, Bruce Momjian  wrote:
> Thanks for the many suggestions on improving the 9.3 release notes.
> There were many ideas I would have never thought of.  Please keep the
> suggestions coming.

Bruce,

Thanks for writing them!

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] high io BUT huge amount of free memory

2013-04-24 Thread Shaun Thomas

On 04/24/2013 08:24 AM, Robert Haas wrote:


Are you referring to the fact that vm.zone_reclaim_mode = 1 is an
idiotic default?


Well... it is. But even on systems where it's not the default or is 
explicitly disabled, there's just something hideously wrong with NUMA in 
general. Take a look at our numa distribution on a heavily loaded system:


available: 2 nodes (0-1)
node 0 cpus: 0 2 4 6 8 10 12 14 16 18 20 22
node 0 size: 36853 MB
node 0 free: 14315 MB
node 1 cpus: 1 3 5 7 9 11 13 15 17 19 21 23
node 1 size: 36863 MB
node 1 free: 300 MB
node distances:
node   0   1
  0:  10  20
  1:  20  10

What the hell? Seriously? Using numactl and starting in interleave 
didn't fix this, either. It just... arbitrarily ignores a huge chunk of 
memory for no discernible reason.


The memory pressure code in Linux is extremely fucked up. I can't find 
it right now, but the memory management algorithm makes some pretty 
ridiculous assumptions once you pass half memory usage, regarding what 
is in active and inactive cache.


I hate to rant, but it gets clearer to me every day that Linux is 
optimized for desktop systems, and generally only kinda works for 
servers. Once you start throwing vast amounts of memory, CPU, and 
processes at it though, things start to get unpredictable.


That all goes back to my earlier threads that disabling process 
autogrouping via the kernel.sched_autogroup_enabled setting, magically 
gave us 20-30% better performance. The optimal setting for a server is 
clearly to disable process autogrouping, and yet it's enabled by 
default, and strongly advocated by Linus himself as a vast improvement.


I get it. It's better for desktop systems. But the LAMP stack alone has 
probably a couple orders of magnitude more use cases than Joe Blow's 
Pentium 4 in his basement. Yet it's the latter case that's optimized for.


Servers are getting shafted in a lot of cases, and it's actually 
starting to make me angry.


--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-676-8870
stho...@optionshouse.com

__

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to 
this email


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] REFRESH MATERIALIZED VIEW command in PL block hitting Assert

2013-04-24 Thread Kevin Grittner
Jeevan Chalke  wrote:
> On Wed, Apr 24, 2013 at 3:04 AM, Kevin Grittner  wrote:

>> Any objections to the attached to fix this issue?

> Nope. Fine with me.

Pushed.  Thanks for the report!

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] high io BUT huge amount of free memory

2013-04-24 Thread Andres Freund
On 2013-04-24 08:39:09 -0500, Shaun Thomas wrote:
> The memory pressure code in Linux is extremely fucked up. I can't find it
> right now, but the memory management algorithm makes some pretty ridiculous
> assumptions once you pass half memory usage, regarding what is in active and
> inactive cache.
> 
> I hate to rant, but it gets clearer to me every day that Linux is optimized
> for desktop systems, and generally only kinda works for servers. Once you
> start throwing vast amounts of memory, CPU, and processes at it though,
> things start to get unpredictable.

> That all goes back to my earlier threads that disabling process autogrouping
> via the kernel.sched_autogroup_enabled setting, magically gave us 20-30%
> better performance. The optimal setting for a server is clearly to disable
> process autogrouping, and yet it's enabled by default, and strongly
> advocated by Linus himself as a vast improvement.

> I get it. It's better for desktop systems. But the LAMP stack alone has
> probably a couple orders of magnitude more use cases than Joe Blow's Pentium
> 4 in his basement. Yet it's the latter case that's optimized for.

IIRC there have been some scalability improvements to that code.

> Servers are getting shafted in a lot of cases, and it's actually starting to
> make me angry.

Uh. Ranting can be rather healthy thing every now and then and it good
for the soul and such. But. Did you actually try reporting those issues?
In my experience the mm and scheduler folks are rather helpful if they
see you're actually interested in fixing a problem. I have seen rants
about this topic on various pg lists for the last months but I can't
remember seeing mails on lkml about it. How should they fix what they
don't know about?

You know, before Robert got access to the bigger machine we *also* had
some very bad behaviour on them. And our writeout mechanism/buffer
acquisition mechanism still utterly sucks there. But that doesn't mean
we don't care.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] high io BUT huge amount of free memory

2013-04-24 Thread Shaun Thomas

On 04/24/2013 08:49 AM, Andres Freund wrote:


Uh. Ranting can be rather healthy thing every now and then and it good
for the soul and such. But. Did you actually try reporting those issues?


That's actually part of the problem. How do you report:

Throwing a lot of processes at a high-memory system seems to break the 
mm code in horrible ways.


I'm asking seriously here, because I have no clue how to isolate this 
behavior. It's clearly happening often enough that random people are 
starting to notice now that bigger servers are becoming the norm.


I'm also not a kernel dev in any sense of the word. My C is so rusty, I 
can barely even read the patches going through the ML. I feel 
comfortable posting to PG lists because that's my bread and butter. 
Kernel lists seem way more imposing, and I'm probably not the only one 
who feels that way.


I guess I don't mean to imply that kernel devs don't care. Maybe the 
right way to put it is that there don't seem to be enough kernel devs 
being provided with more capable testing hardware. Which is odd, 
considering Red Hat's involvement and activity on the kernel.


I apologize, though. These last few months have been really frustrating 
thanks to this and other odd kernel-related issues. We've reached an 
equilibrium where the occasional waste of 20GB of RAM doesn't completely 
cripple the system, but this thread kinda struck a sore point. :)


--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-676-8870
stho...@optionshouse.com

__

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to 
this email


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] high io BUT huge amount of free memory

2013-04-24 Thread Andres Freund
On 2013-04-24 09:06:39 -0500, Shaun Thomas wrote:
> On 04/24/2013 08:49 AM, Andres Freund wrote:
> 
> >Uh. Ranting can be rather healthy thing every now and then and it good
> >for the soul and such. But. Did you actually try reporting those issues?
> 
> That's actually part of the problem. How do you report:
> 
> Throwing a lot of processes at a high-memory system seems to break the mm
> code in horrible ways.

Well. Report memory distribution. Report perf profiles. Ask *them* what
information they need. They aren't grumpy if you are behaving
sensibly. YMMV of course.

> I'm asking seriously here, because I have no clue how to isolate this
> behavior. It's clearly happening often enough that random people are
> starting to notice now that bigger servers are becoming the norm.
> 
> I'm also not a kernel dev in any sense of the word. My C is so rusty, I can
> barely even read the patches going through the ML. I feel comfortable
> posting to PG lists because that's my bread and butter. Kernel lists seem
> way more imposing, and I'm probably not the only one who feels that way.

I can understand that. But you had to jump over the fence to post here
once as well ;). Really, report it and see what comes out. The worst
that can happen is that you get a grumpy email ;)
And in the end, jumping might ease the pain in the long run considerably
even if its uncomfortable at first...

Feel free to CC me.

> I guess I don't mean to imply that kernel devs don't care. Maybe the right
> way to put it is that there don't seem to be enough kernel devs being
> provided with more capable testing hardware. Which is odd, considering Red
> Hat's involvement and activity on the kernel.

There are quite some people using huge servers, but that doesn't imply
they are seing the same problems. During testing they mostly use a set
of a few benchmarks (part of which is pgbench btw) and apparently they
don't show this problem. Also this is horribly workload and hardware
dependent. There are enough people happily using postgres on linux on
far bigger hardware than what you reported upthread.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Allowing parallel pg_restore from pipe

2013-04-24 Thread Timothy Garnett
Hi All,

Currently the -j option to pg_restore, which allows for parallelization in
the restore, can only be used if the input file is a regular file and not,
for ex., a pipe.  However this is a pretty common occurrence for us
(usually in the form of pg_dump | pg_restore to copy an individual database
or some tables thereof from one machine to another).  While there's no good
way to parallelize the data load steps when reading from a pipe, the index
and constraint building can still be parallelized and as they are generally
CPU bound on our machines we've found quite a bit of speedup from doing so.

Attached is two diffs off of the REL9_2_4 tag that I've been using.  The
first is a simple change that serially loads the data section before
handing off the remainder of the restore to the existing parallelized
restore code (the .ALT. diff).  The second which gets more parallelization
but is a bit more of a change uses the existing dependency analysis code to
allow index building etc. to occur in parallel with data loading. The data
loading tasks are still performed serially in the main thread, but non-data
loading tasks are scheduled in parallel as their dependencies are satisfied
(with the caveat that the main thread can only dispatch new tasks between
data loads).

Anyways, the question is if people think this is generally useful.  If so I
can clean up the preferred choice a bit and rebase it off of master, etc.

Tim


0003-patch-pg_restore-to-allow-parallel-restore-when-the.ALT.patch
Description: Binary data


0003-patch-pg_restore-to-allow-parallel-restore-when-the-.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] missing time.h include in psql/command.c since the addition of \watch

2013-04-24 Thread Heikki Linnakangas

On 24.04.2013 16:27, Andres Freund wrote:

Hi,

our internal testbuilds show a new warning on windows:
src\bin\psql\command.c(2617): warning C4013: 'time' undefined; assuming extern 
returning int 
[C:\jenkins\workspace\andres_git.postgresql.org_windows\BT\release\SL_OS\windows\TA\x86\TO\xp\psql.vcxproj]
src\bin\psql\command.c(2619): warning C4013: 'asctime' undefined; assuming 
extern returning int 
[C:\jenkins\workspace\andres_git.postgresql.org_windows\BT\release\SL_OS\windows\TA\x86\TO\xp\psql.vcxproj]
src\bin\psql\command.c(2619): warning C4013: 'localtime' undefined; assuming 
extern returning int 
[C:\jenkins\workspace\andres_git.postgresql.org_windows\BT\release\SL_OS\windows\TA\x86\TO\xp\psql.vcxproj]

Looking at command.c its neither surprising that we get that warning,
nor that we don't get it on non-windows builds. Those get time.h via
portability/instr_time.h.

So it seems we should add a time.h include to command.c. Trivial patch
for that attached.


Thanks, applied.

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] putting a bgworker to rest

2013-04-24 Thread Dimitri Fontaine
Andres Freund  writes:
>> How would postmaster know when to restart a worker that stopped?
>
> I had imagined we would assign some return codes special
> meaning. Currently 0 basically means "restart immediately", 1 means
> "crashed, wait for some time", everything else results in a postmaster
> restart. It seems we can just assign returncode 2 as "done", probably
> with some enum or such hiding the numbers.

In Erlang, the lib that cares about such things in called OTP, and that
proposes a model of supervisor that knows when to restart a worker. The
specs for the restart behaviour are:

  Restart = permanent | transient | temporary

Restart defines when a terminated child process should be restarted.

  - A permanent child process is always restarted.

  - A temporary child process is never restarted (not even when the
supervisor's restart strategy is rest_for_one or one_for_all and a
sibling's death causes the temporary process to be terminated).

  - A transient child process is restarted only if it terminates
abnormally, i.e. with another exit reason than normal, shutdown or
{shutdown,Term}.

Then about restart frequency, what they have is:

The supervisors have a built-in mechanism to limit the number of
restarts which can occur in a given time interval. This is
determined by the values of the two parameters MaxR and MaxT in the
start specification returned by the callback function [ ... ]

If more than MaxR number of restarts occur in the last MaxT seconds,
then the supervisor terminates all the child processes and then
itself.

You can read the whole thing here:

http://www.erlang.org/doc/design_principles/sup_princ.html#id71215

I think we should get some inspiration from them here.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.3 Beta1 status report

2013-04-24 Thread Heikki Linnakangas

On 24.04.2013 06:22, Bruce Momjian wrote:

On Tue, Apr 23, 2013 at 06:56:34PM -0300, Alvaro Herrera wrote:

Bruce Momjian wrote:

On Tue, Apr 23, 2013 at 05:04:15PM -0400, Bruce Momjian wrote:

Do we usually repeat the changes listed in the backwards
compatibility section later, in the "Changes" section? If not, then
instead of the first two items above, let's just have these in the
backwards-compatibility section:


We do not repeat the incompatibile items later in release notes.  I have
added some of your text to one of the items, and added a mention that
tooling will need adjustment.  Please check the post-commit version and
let me know about adjustments.


Let me clarify --- changes to our WAL binary format and source code
changes are not really incompatibilities from a user perspective as we
never promise to do our best to minimize such changes  --- m eaning the
fact the WAL format changed is something that would be only in the
source code section and not in the "incompatibilities section"  ---
incompatibilities are related to change in user experience or
documented-API changes.


These guidelines makes sense, except I think the change in naming
standard of xlog segments is important to document clearly because, even
if we have not promised compatibility, people could be relying on it in
scripts.  I think it makes sense to waste a couple of lines documenting
this change, even if we expect the number of people to be bitten by it
to be very low.


Right. Kevin mentioned he had a script that knew about the numbering: 
http://www.postgresql.org/message-id/4fd09b5e022500048...@gw.wicourts.gov.



Agreed.  Here is the new text:

 Store WAL in a continuous stream, rather than skipping the last
 16MB segment every 4GB (Heikki Linnakangas)  BACKWARD COMPATIBLE BREAK

 Previously, WAL files ending in FF were not used.  If you have
 WAL backup or restore scripts that took that skipping into account,
 they need to be adjusted.


Looks good, thanks!

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] putting a bgworker to rest

2013-04-24 Thread Robert Haas
On Tue, Apr 23, 2013 at 1:22 PM, Andres Freund  wrote:
>> So a "done" worker would never be restarted, until postmaster sees a
>> crash or is itself restarted?  I guess that'd be useful for workers
>> running during recovery, which terminate when recovery completes.  Is
>> that your use case?
>
> Well, its not actual postgres recovery, but something similar in the
> context of logical replication.

It's probably too late to be twiddling this very much more, but
another thing I think would be useful is for backends to have the
ability to request that the postmaster start a worker of type xyz,
rather than having the server start it automatically at startup time.
That's what you'd need for parallel query, and there might be some
replication-related use cases for such things as well.  The general
usage pattern would be:

- regular backend realizes that it needs help
- kicks postmaster to start a helper process
- helper process runs for a while, doing work
- helper process finishes work, maybe waits around for some period of
time to see if any new work arrives, and then exits
- eventually go back to step 1

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Examples Re: [HACKERS] Bug Fix: COLLATE with multiple ORDER BYs in aggregates

2013-04-24 Thread David Fetter
On Tue, Apr 23, 2013 at 09:57:27AM -0700, David Fetter wrote:
> Folks,
> 
> While testing the upcoming FILTER clause for aggregates, Erik Rijkers
> uncovered a long-standing bug in $subject, namely that this case
> wasn't handled.  Please find attached a patch by Andrew Gierth and
> myself which fixes this issue and adds a regression test to ensure it
> remains fixed.

Please see below results when I run the regression test query on git
master, REL_9_2_STABLE, and REL9_1_STABLE, respectively:

$ psql postgres
psql (9.3devel)
Type "help" for help.

shackle@postgres:5493=# SELECT array_agg(a COLLATE "C" ORDER BY b COLLATE 
"POSIX") FROM (VALUES ('foo','bar')) v(a,b);
ERROR:  collation mismatch between explicit collations "C" and "POSIX"
LINE 1: SELECT array_agg(a COLLATE "C" ORDER BY b COLLATE "POSIX") F...
  ^

$ psql postgres
psql (9.2.4)
Type "help" for help.

shackle@postgres:5492=# SELECT array_agg(a COLLATE "C" ORDER BY b COLLATE 
"POSIX") FROM (VALUES ('foo','bar')) v(a,b);
ERROR:  collation mismatch between explicit collations "C" and "POSIX"
LINE 1: SELECT array_agg(a COLLATE "C" ORDER BY b COLLATE "POSIX") F...
  ^

$ psql postgres
psql (9.1.9)
Type "help" for help.

shackle@postgres:5491=# SELECT array_agg(a COLLATE "C" ORDER BY b COLLATE 
"POSIX") FROM (VALUES ('foo','bar')) v(a,b);
ERROR:  collation mismatch between explicit collations "C" and "POSIX"
LINE 1: SELECT array_agg(a COLLATE "C" ORDER BY b COLLATE "POSIX") F...
  ^
Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re[2]: [HACKERS] high io BUT huge amount of free memory

2013-04-24 Thread Миша Тюрин

  thanks a lot for responses
 
1) just remind my case

Intel 32 core = 2*8 *2threads
Linux 2.6.32-5-amd64 #1 SMP Sun May 6 04:00:17 UTC 2012 x86_64 GNU/Linux
PostgreSQL 9.2.2 on x86_64-unknown-linux-gnu, compiled by gcc-4.4.real (Debian 
4.4.5-8) 4.4.5, 64-bit
shared_buffers 64GB / constant hit rate - 99,18
max_connections 160 / with pgbouncer pools there could not be more than 120 
connections at all
work_mem 32M
checkpoint 1h 1.0
swap off
numa off, interleaving on 
and
! disks usage 100% (free 128GB! WHY?)
disk throughput - up-to 30MB/s (24r+6w)
io - up-to 2,5-3K/s (0,5w + 2-2,5r)
typical work load - pk-index-scans
my warm work set is about 400GB
db at all - 700GB

2) numactl 
mtyurin@avi-sql09:~$ numactl --hardware
available: 1 nodes (0-0)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 
25 26 27 28 29 30 31
node 0 size: 393181 MB
node 0 free: 146029 MB
node distances:
node   0
  0:  10

3) !! i just found suspicious relation between "active" processes and free 
memory. ~1GB per process.
in 376GB total memmory and 32 core
if ( user cpu + io wait ) is ~140% then i have ~140GB free.
but it could be just a coincidence.

4) now i think
a) upgrade linux core (to 3.2!?) and then (if case still will be)
b) set buffers to something like 300-320Gb

5) what do you know about workload in Berkus's case 
http://www.databasesoup.com/2012/04/red-hat-kernel-cache-clearing-issue.html ?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re[3]: [HACKERS] high io BUT huge amount of free memory

2013-04-24 Thread Миша Тюрин

typo

> if ( user cpu + io wait ) is ~140% then i have ~140GB free.

140% ===>> 1400%

if ~14 cores are busy then ~140GB is free

10GB per process

hmmm...



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [HACKERS] Re[3]: [HACKERS] high io BUT huge amount of free memory

2013-04-24 Thread Миша Тюрин
vm state

root@avi-sql09:~# /sbin/sysctl -a|grep vm
vm.overcommit_memory = 0
vm.panic_on_oom = 0
vm.oom_kill_allocating_task = 0
vm.oom_dump_tasks = 0
vm.overcommit_ratio = 50
vm.page-cluster = 3
vm.dirty_background_ratio = 10
vm.dirty_background_bytes = 0
vm.dirty_ratio = 20
vm.dirty_bytes = 0
vm.dirty_writeback_centisecs = 500
vm.dirty_expire_centisecs = 3000
vm.nr_pdflush_threads = 0
vm.swappiness = 0
vm.nr_hugepages = 0
vm.hugetlb_shm_group = 0
vm.hugepages_treat_as_movable = 0
vm.nr_overcommit_hugepages = 0
vm.lowmem_reserve_ratio = 256   256 32
vm.drop_caches = 0
vm.min_free_kbytes = 65536
vm.percpu_pagelist_fraction = 0
vm.max_map_count = 65530
vm.laptop_mode = 0
vm.block_dump = 0
vm.vfs_cache_pressure = 100
vm.legacy_va_layout = 0
vm.zone_reclaim_mode = 0
vm.min_unmapped_ratio = 1
vm.min_slab_ratio = 5
vm.stat_interval = 1
vm.mmap_min_addr = 65536
vm.numa_zonelist_order = default
vm.scan_unevictable_pages = 0
vm.memory_failure_early_kill = 0
vm.memory_failure_recovery = 1

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSOC Student Project Idea

2013-04-24 Thread Michael Schuh
On Wed, Apr 24, 2013 at 5:31 AM, Florian Pflug  wrote:

> On Apr23, 2013, at 23:25 , Alexander Korotkov 
> wrote:
> > I've taken a brief look on the paper and implementation. As I can see
> iDistance implements some global building strategy. I mean, for example, it
> selects some point, calculates distances from selected point to all points
> in dataset etc. So, it uses the whole dataset at the same time.
> >
> > However you can try to implement global index building in GiST or
> SP-GiST. In this case I think you should carefully estimate your
> capabilities during single GSoC project. You would need to extend GiST or
> SP-GiST interface and write completely new implementation of tree building
> algorithm. Question of how to exactly extend GiST or SP-GiST interface for
> this case could appear to be very hard even theoretically.
>
> +1. That seemed to be a major roadblock to me too when I read the paper.
>
> You could work around that by making partition identification a separate
> step. You'd have a function
>
>   idist_analyze(cfg name, table name, field name)
>
> which'd identify suitable partitions for the data distribution in
> table.field and store them somewhere. Such a set of pre-identified
> partitions would be akin to a tsearch configuration, i.e. all other parts
> of the iDistance machinery would use it to map points to index keys and
> queries to ranges of those keys. You'll want to look at how tsearch handles
> that, and check if the method can indeed be applied to iDistance.
>
> In a first cut, you'd probably only allow inserts into index which don't
> change the maximum distances from the partition centers that
> idist_analyze() found. With that restriction in place, you might get away
> without GiST or SP-GiSt, and simply use postgres' standard btree, if you
> find a way to map queries of the form "field 
> idistance_knn_function(point, distance)" to "where (idstance_keys(field)
> between P1_lowerbound AND P2_upperbound) OR (idistance_keys(field) between
> P2_lowerbuild AND P2_upperbound) …" or something equivalent.



Thank you both for the very helpful feedback. Perhaps the scope of this
project (application's "completeness criteria") is better as
a feasibility prototyping of the global/distance-based index strategy with
B+-tree and/or GiST extension possibilities.

Let me briefly elaborate on the current implementation details related to
your comments. The public C++ version uses a stand-alone STX B+-tree
implementation, and has only been used in the research context of one-time
data loading and indexing and then KNN retrieval performance efficiency.
This requires an upfront global partition building (sequential look at all
data points) and a bit of overhead information about the partitions (such
as reference point locations and maximum distances in each). Of course,
there are a lot of improvements, modifications, variants, etc., that can be
built from this basic setup... but that's mostly my in-progress thesis work
yet published or defended :)

A given query is broken down into range(s) of key values in the B+-tree,
based on the negligible overhead info kept from partitioning. Only then
does this small subset of pages need to be loaded from disk. where the
partitions are located with respect to the query. Therefore it is necessary
to have left/right leaf pointers within the B+-tree. While I think a
DBMS-integrated implementation would be more ideal for general use by
everyone, my naive assumption is that I could probably implement the
idistance functionality in stored procedures and metadata tables (at the
cost of performance and usability).

Best regards,
Mike Schuh


Re: [HACKERS] Allowing parallel pg_restore from pipe

2013-04-24 Thread Andrew Dunstan


On 04/23/2013 07:53 PM, Timothy Garnett wrote:

Hi All,

Currently the -j option to pg_restore, which allows for 
parallelization in the restore, can only be used if the input file is 
a regular file and not, for ex., a pipe.  However this is a pretty 
common occurrence for us (usually in the form of pg_dump | pg_restore 
to copy an individual database or some tables thereof from one machine 
to another).  While there's no good way to parallelize the data load 
steps when reading from a pipe, the index and constraint building can 
still be parallelized and as they are generally CPU bound on our 
machines we've found quite a bit of speedup from doing so.


Attached is two diffs off of the REL9_2_4 tag that I've been using. 
 The first is a simple change that serially loads the data section 
before handing off the remainder of the restore to the existing 
parallelized restore code (the .ALT. diff).  The second which gets 
more parallelization but is a bit more of a change uses the existing 
dependency analysis code to allow index building etc. to occur in 
parallel with data loading. The data loading tasks are still performed 
serially in the main thread, but non-data loading tasks are scheduled 
in parallel as their dependencies are satisfied (with the caveat that 
the main thread can only dispatch new tasks between data loads).


Anyways, the question is if people think this is generally useful.  If 
so I can clean up the preferred choice a bit and rebase it off of 
master, etc.



I don't think these are bad ideas at all, and probably worth doing. Note 
that there are some fairly hefty changes affecting this code in master, 
so your rebasing could be tricky.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allowing parallel pg_restore from pipe

2013-04-24 Thread Dimitri Fontaine
Andrew Dunstan  writes:
> On 04/23/2013 07:53 PM, Timothy Garnett wrote:
>> Anyways, the question is if people think this is generally useful.  If so
>> I can clean up the preferred choice a bit and rebase it off of master,
>> etc.

I find this idea very useful yes.

Another idea would be to allow for parallel pg_dump output to somehow be
piped into a parallel pg_restore. I don't know how to solve that at all,
it just sound something worthy of doing too.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allowing parallel pg_restore from pipe

2013-04-24 Thread Andrew Dunstan


On 04/24/2013 03:40 PM, Dimitri Fontaine wrote:

Andrew Dunstan  writes:

On 04/23/2013 07:53 PM, Timothy Garnett wrote:

Anyways, the question is if people think this is generally useful.  If so
I can clean up the preferred choice a bit and rebase it off of master,
etc.

I find this idea very useful yes.

Another idea would be to allow for parallel pg_dump output to somehow be
piped into a parallel pg_restore. I don't know how to solve that at all,
it just sound something worthy of doing too.




That's not going to work, the output from parallel pg_dump is inherently 
multiple streams. That's why it ONLY supports directory format, and not 
even custom format on disk, let alone a pipe.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allowing parallel pg_restore from pipe

2013-04-24 Thread Andrew Dunstan


On 04/24/2013 03:49 PM, Andrew Dunstan wrote:


On 04/24/2013 03:40 PM, Dimitri Fontaine wrote:

Andrew Dunstan  writes:

On 04/23/2013 07:53 PM, Timothy Garnett wrote:
Anyways, the question is if people think this is generally useful.  
If so

I can clean up the preferred choice a bit and rebase it off of master,
etc.

I find this idea very useful yes.

Another idea would be to allow for parallel pg_dump output to somehow be
piped into a parallel pg_restore. I don't know how to solve that at all,
it just sound something worthy of doing too.




That's not going to work, the output from parallel pg_dump is 
inherently multiple streams. That's why it ONLY supports directory 
format, and not even custom format on disk, let alone a pipe.





What might make sense is something like pg_dump_restore which would have 
no intermediate storage at all, just pump the data etc from one source 
to another in parallel. But I pity the poor guy who has to write it :-)


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allowing parallel pg_restore from pipe

2013-04-24 Thread Stefan Kaltenbrunner
On 04/24/2013 09:51 PM, Andrew Dunstan wrote:
> 
> On 04/24/2013 03:49 PM, Andrew Dunstan wrote:
>>
>> On 04/24/2013 03:40 PM, Dimitri Fontaine wrote:
>>> Andrew Dunstan  writes:
 On 04/23/2013 07:53 PM, Timothy Garnett wrote:
> Anyways, the question is if people think this is generally useful. 
> If so
> I can clean up the preferred choice a bit and rebase it off of master,
> etc.
>>> I find this idea very useful yes.
>>>
>>> Another idea would be to allow for parallel pg_dump output to somehow be
>>> piped into a parallel pg_restore. I don't know how to solve that at all,
>>> it just sound something worthy of doing too.
>>>
>>
>>
>> That's not going to work, the output from parallel pg_dump is
>> inherently multiple streams. That's why it ONLY supports directory
>> format, and not even custom format on disk, let alone a pipe.
>>
> 
> 
> What might make sense is something like pg_dump_restore which would have
> no intermediate storage at all, just pump the data etc from one source
> to another in parallel. But I pity the poor guy who has to write it :-)

hmm pretty sure that Joachims initial patch for parallel dump actually
had a PoC for something very similiar to that...


Stefan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Enabling Checksums

2013-04-24 Thread Jeff Davis
On Wed, 2013-04-24 at 08:20 +0100, Simon Riggs wrote:
> On 24 April 2013 01:10, Jeff Davis  wrote:
> > I'd prefer that it was some kind of a checksum ID code -- e.g. 0 for no
> > checksum, 1 for FNV-1a-SR3, etc. That would allow us to release 9.4 with
> > a new algorithm without forcing existing users to change.
> 
> That's exactly what the patch does.

The word "version" indicates an order to it though, like N+1 is always
preferable to N. This is user-facing (through pg_controldata output),
otherwise I wouldn't mind.

> > initdb would have to take the code as an option, probably in string
> > form.
> 
> When/if we have multiple options we can add that. The main thing was
> to make sure the control file recorded things in a common way.

The main strange thing to me is that we're still using the
enabled/disabled for the output of pg_controldata as well as the
"version".

When we do have multiple options, it seems like we'd just have one field
output:

  Data page checksums: none|crc32c|pg-fnv

What goal are you trying to accomplish with this patch? pg_control
doesn't need to be compatible between releases, so can't we just add
this later when we really do have multiple options?

Regards,
Jeff Davis





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Enabling Checksums

2013-04-24 Thread Simon Riggs
On 24 April 2013 21:06, Jeff Davis  wrote:

> What goal are you trying to accomplish with this patch?

That we might need to patch the checksum version on a production release.

--
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allowing parallel pg_restore from pipe

2013-04-24 Thread Joachim Wieland
On Wed, Apr 24, 2013 at 4:05 PM, Stefan Kaltenbrunner <
ste...@kaltenbrunner.cc> wrote:

> > What might make sense is something like pg_dump_restore which would have
> > no intermediate storage at all, just pump the data etc from one source
> > to another in parallel. But I pity the poor guy who has to write it :-)
>
> hmm pretty sure that Joachims initial patch for parallel dump actually
> had a PoC for something very similiar to that...


That's right, I implemented that as an own output format and named it
"migrator" I think, which wouldn't write each stream to a file as the
directory output format does but that instead pumps it back into a restore
client.

Actually I think the logic was even reversed, it was a parallel restore
that got the data from internally calling pg_dump functionality instead of
from reading files... The neat thing about this approach was that the order
was optimized and correct, i.e. largest tables start first and dependencies
get resolved in the right order.

I could revisit that patch for 9.4 if enough people are interested.

Joachim


Re: [HACKERS] Allowing parallel pg_restore from pipe

2013-04-24 Thread Claudio Freire
On Wed, Apr 24, 2013 at 6:47 PM, Joachim Wieland  wrote:
> On Wed, Apr 24, 2013 at 4:05 PM, Stefan Kaltenbrunner
>  wrote:
>>
>> > What might make sense is something like pg_dump_restore which would have
>> > no intermediate storage at all, just pump the data etc from one source
>> > to another in parallel. But I pity the poor guy who has to write it :-)
>>
>> hmm pretty sure that Joachims initial patch for parallel dump actually
>> had a PoC for something very similiar to that...
>
>
> That's right, I implemented that as an own output format and named it
> "migrator" I think, which wouldn't write each stream to a file as the
> directory output format does but that instead pumps it back into a restore
> client.
>
> Actually I think the logic was even reversed, it was a parallel restore that
> got the data from internally calling pg_dump functionality instead of from
> reading files... The neat thing about this approach was that the order was
> optimized and correct, i.e. largest tables start first and dependencies get
> resolved in the right order.
>
> I could revisit that patch for 9.4 if enough people are interested.


Indeed... I've wasted hours copying databases for test environments,
when that could've been hour (singular).


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal to add --single-row to psql

2013-04-24 Thread Christopher Manning
Fabrízio and Tom,

I know that you can use  --variable="FETCH_COUNT=1" from the
psql command line, but internally that uses a CURSOR to batch the rows and
[Redshift doesn't support CURSOR](
https://forums.aws.amazon.com/thread.jspa?threadID=122664&tstart=0) so it's
not an option when using psql to download data from Redshift.

Pavel's idea of having a single row mode option to replace FETCH_COUNT is
interesting, does anyone have any problems with that or alternative ideas?

Regards,
Christopher



On Wed, Apr 24, 2013 at 7:04 AM, Fabrízio de Royes Mello <
fabriziome...@gmail.com> wrote:

>
> On Tue, Apr 23, 2013 at 1:05 PM, Tom Lane  wrote:
> >
> > Isn't there already a way to set FETCH_COUNT from the command line?
> > (ie, I think there's a generic variable-assignment facility that could
> > do this)
> >
>
> Christopher,
>
> Tom is all right... from psql [1] command line we can do that:
>
> $ bin/psql --variable=FETCH_COUNT=100
> psql (9.3devel)
> Type "help" for help.
>
> fabrizio=# \echo :FETCH_COUNT
> 100
>
> Regards,
>
> [1]
> http://www.postgresql.org/docs/current/interactive/app-psql.html#AEN84903
>
> --
> Fabrízio de Royes Mello
> Consultoria/Coaching PostgreSQL
> >> Blog sobre TI: http://fabriziomello.blogspot.com
> >> Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
> >> Twitter: http://twitter.com/fabriziomello
>


[HACKERS] danger of stats_temp_directory = /dev/shm

2013-04-24 Thread Jeff Janes
With the stats file split patch 187492b6c2e8cafc5 introduced in 9.3dev, now
after a crash the postmaster will try to delete all files in the directory
stats_temp_directory.  When that is just a subdirectory of PGDATA, this is
fine. But it seems rather hostile when it is set to a shared directory,
like the popular /dev/shm.

Previously, it would only try to delete the one file /dev/shm/pgstat.stat,
so the danger was much smaller.

If /dev/shm is used, this only comes into play when postgres has crashed
but the OS has not (If the OS has crashed, /dev/shm it will be empty anyway
when it comes back) so perhaps this is not such a large exposure.

The docs don't discuss the issue of what happens if stats_temp_directory is
set to a non-private (to PGDATA) directory.  The docs also do not
explicitly recommend using /dev/shm, but there are plenty of examples of
that usage that come up on google (and no examples of using a private
subdirectory of /dev/shm rather than /dev/shm itself).

Does this need to be fixed, or at least documented?

Cheers,

Jeff


Re: [HACKERS] Failing start-up archive recovery at Standby mode in PG9.2.4

2013-04-24 Thread Kyotaro HORIGUCHI
I had a bit look on it and came up with an hypothesis.. umm or a
scenario.

==

 Just after restartpoint, too old xlog files are recycled but its
page header has old content, specifically, xlogid and xrecoff.

Plus, if the master's LSN is at the head of new segment file, the
file for the segment may have not been created and the WAL send
request for that LSN from the standby might fail as "requested
WAL segment %s has already been removed", in spite of the segment
is "not yet created"...

 So the standby received nack for the request and looks for
archive but the file is properly not existent. But the file of
that segment is certainly found in pg_xlog directory. So
XLogFileRead grabs the old-in-content file and...

> seems showing that the standby received negative ack for the request
> for 20th segment, and 5 seconds later, it tried to get that from
> archive and somehow thought that it'd gotten but the header is of 6th
> segment.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] danger of stats_temp_directory = /dev/shm

2013-04-24 Thread Alvaro Herrera
Jeff Janes escribió:
> With the stats file split patch 187492b6c2e8cafc5 introduced in 9.3dev, now
> after a crash the postmaster will try to delete all files in the directory
> stats_temp_directory.  When that is just a subdirectory of PGDATA, this is
> fine. But it seems rather hostile when it is set to a shared directory,
> like the popular /dev/shm.

> Does this need to be fixed, or at least documented?

I think we need it fixed so that it only deletes the files matching a
well-known pattern.  In fact, there's an XXX comment about this in the
code:

/* XXX should we try to ignore files other than the ones we write? */

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Failing start-up archive recovery at Standby mode in PG9.2.4

2013-04-24 Thread KONDO Mitsumasa

Hi,


I find problem about failing start-up achive recovery at Standby mode in 
PG9.2.4 streaming replication.
I test same problem in PG9.2.3. But it is not occerd...



cp: cannot stat `../arc/00030013': そのようなファイルやディレクトリはありません
[Standby] 2013-04-22 01:27:25 EDTLOG:  0: restored log file 
"00020013" from archive


I can't read the error message here, but this looks suspicious. The
recovery command seems to be returning success although it prints such
an error message?

Sorry.. This suspicious massage is written in japanese.
It means "missing file or directory". Beacause starting point of timeline 3 WAL 
file is 0003002A.
 

cp: cannot stat `../arc/00030014': そのようなファイルやディレクトリはありません

[* This point is illegal work]> [Standby] 2013-04-22 01:27:25 EDTLOG:  0: 
unexpected pageaddr 0/600 in log file 0, segment 20, offset 0

[Standby] 2013-04-22 01:27:25 EDTLOCATION:  ValidXLOGHeader, xlog.c:4395
cp: cannot stat `../arc/00030014': そのようなファイルやディレクトリはありません
cp: cannot stat `../arc/0004.history': そのようなファイルやディレクトリはありません
[Master] 2013-04-22 01:27:25 EDTDEBUG:  0: received replication command: 
IDENTIFY_SYSTEM
[Master] 2013-04-22 01:27:25 EDTLOCATION:  HandleReplicationCommand, 
walsender.c:449
[Master] 2013-04-22 01:27:25 EDTDEBUG:  0: received replication command: 
START_REPLICATION 0/1400
[Master] 2013-04-22 01:27:25 EDTLOCATION:  HandleReplicationCommand, 
walsender.c:449
[Master] 2013-04-22 01:27:25 EDTFATAL:  58P01: requested WAL segment 
00030014 has already been removed
[Master] 2013-04-22 01:27:25 EDTLOCATION:  XLogRead, walsender.c:1023
[Standby] 2013-04-22 01:27:25 EDTLOG:  0: streaming replication 
successfully connected to primary
[Standby] 2013-04-22 01:27:25 EDTLOCATION:  libpqrcv_connect, 
libpqwalreceiver.c:171
[Standby] 2013-04-22 01:27:25 EDTFATAL:  XX000: could not receive data from WAL 
stream: FATAL:  requested WAL segment 00030014 has already been 
removed

[Standby] 2013-04-22 01:27:25 EDTLOCATION:  libpqrcv_receive, 
libpqwalreceiver.c:389

[mitsu-ko@localhost postgresql-9.2.4]$ bincp: cannot stat 
`../arc/00030014': そのようなファイルやディレクトリはありません
[Standby] 2013-04-22 01:27:30 EDTDEBUG:  0: unexpected pageaddr 0/600 
in log file 0, segment 20, offset 0
[Standby] 2013-04-22 01:27:30 EDTLOCATION:  ValidXLOGHeader, xlog.c:4395
cp: cannot stat `../arc/00030014': そのようなファイルやディレクトリはありません
cp: cannot stat `../arc/0004.history': そのようなファイルやディレクトリはありません
[Master] 2013-04-22 01:27:30 EDTDEBUG:  0: received replication command: 
IDENTIFY_SYSTEM
[Master] 2013-04-22 01:27:30 EDTLOCATION:  HandleReplicationCommand, 
walsender.c:449
[Master] 2013-04-22 01:27:30 EDTDEBUG:  0: received replication command: 
START_REPLICATION 0/1400
[Master] 2013-04-22 01:27:30 EDTLOCATION:  HandleReplicationCommand, 
walsender.c:449
[Master] 2013-04-22 01:27:30 EDTFATAL:  58P01: requested WAL segment 
00030014 has already been removed
[Master] 2013-04-22 01:27:30 EDTLOCATION:  XLogRead, walsender.c:1023
[Standby] 2013-04-22 01:27:30 EDTLOG:  0: streaming replication 
successfully connected to primary
[Standby] 2013-04-22 01:27:30 EDTLOCATION:  libpqrcv_connect, 
libpqwalreceiver.c:171
[Standby] 2013-04-22 01:27:30 EDTFATAL:  XX000: could not receive data from WAL 
stream: FATAL:  requested WAL segment 00030014 has already been 
removed


And this seems to be the second problem. You probably need to configure
a higher wal_keep_segments on the primary or you need to fix your
recovery_command.

In this situation, I try to up timelineID by archive recovery, not also recover 
WAL files.
If achive recovery works correctly in Standby, recovery done point is 
00020029 in timelineID 2.
Please see detail in pg9.2.4.log. Situation of archive file is here.
 

We try to fix this problem with my company's colleaguea,
but it is too difficult and complicated to WAL contorol and validation method...

This problem is occerd in xlog.c:10692.

10692 if (!ValidXLOGHeader((XLogPageHeader) readBuf, emode, false))
10693 goto next_record_is_invalid;


And problem is occerd this checking in ValidXLOGHeader().

  4390 if (!XLByteEQ(hdr->xlp_pageaddr, recaddr))
  4391 {
  4392 ereport(emode_for_corrupt_record(emode, recaddr),
  4393 (errmsg("unexpected pageaddr %X/%X in log 
file %u, segment %u, offset %u",
  4394 hdr->xlp_pageaddr.xlogid, 
hdr->xlp_pageaddr.xrecoff,
  4395 readId, readSeg, 
readOff)));
  4396 return false;
  4397 }


We think hdr->xlp_pageaddr.xrecoff has wrong value. This is very rare cace!
We cannot analyze why this value is uncorrect!


I think the tests is just noticing that we don't have valid WA

Re: [HACKERS] Enabling Checksums

2013-04-24 Thread Jeff Davis
On Wed, 2013-04-24 at 21:09 +0100, Simon Riggs wrote:
> On 24 April 2013 21:06, Jeff Davis  wrote:
> 
> > What goal are you trying to accomplish with this patch?
> 
> That we might need to patch the checksum version on a production release.

Oh, I see.

I don't think we need two output fields from pg_controldata though. It's
a little redundant, and confused me when I was looking at the impact on
pg_upgrade. And it means nothing to the user until we actually have
multiple algorithms available, at which time we are better off with a
text representation.

Other than that, I think your patch is fine to accomplish the
aforementioned goal. Essentially, it just changes the bool to a uint32,
which I favor.

Regards,
Jeff Davis



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] danger of stats_temp_directory = /dev/shm

2013-04-24 Thread Tom Lane
Alvaro Herrera  writes:
> Jeff Janes escribió:
>> With the stats file split patch 187492b6c2e8cafc5 introduced in 9.3dev, now
>> after a crash the postmaster will try to delete all files in the directory
>> stats_temp_directory.  When that is just a subdirectory of PGDATA, this is
>> fine. But it seems rather hostile when it is set to a shared directory,
>> like the popular /dev/shm.

>> Does this need to be fixed, or at least documented?

> I think we need it fixed so that it only deletes the files matching a
> well-known pattern.

I think we need it fixed to reject any stats_temp_directory that is not
postgres-owned with restrictive permissions.  The problem here is not
with what it deletes, it's with the insanely insecure configuration.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Please add discussion topics for cluster-hackers meeting

2013-04-24 Thread Josh Berkus
Folks,

The 2013 cluster-hackers meeting is less than a month away.  If you are
attending, and have topics of common interest to discuss, such as core
APIs (i.e. event triggers are in, at least partway), please add your
topic to the wiki here:

https://wiki.postgresql.org/wiki/PgCon2013CanadaClusterSummit#10:30AM_to_11:30AM

... and make sure you're on the list of attendees, so that you get lunch!

Also, note that a 3-hour PostgresXC summit will follow the general
cluster-hackers meeting.  Anyone interested in working on XC is
encouraged to attend.

--Josh Berkus


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Failing start-up archive recovery at Standby mode in PG9.2.4

2013-04-24 Thread Amit Langote
I also had a similar observation when I could reproduce this.
I tried to find why restartpoint causes the recycled segment to be named
after timeline 3, but have not been able to determine that.

When I looked at the source, I found that, the function XLogFileReadAnyTLI
which returns a segment file for reading a XLog page iterates over a list
expectedTLIs which starts with 3 in such a case (that is, in case where this
error happens).  XLogFileReadAnyTLI checks the segment in both archive and
pg_xlog. So, even if such a segment (that is with timeline 3) may not be in
the archive , it may be in pg_xlog, due to recycling as we have observed.
So, such a recycled segment may be returned by XLogFileReadAnyTLI as though
it were the next segment to recover from, resulting in the "unexpected
pageaddr ..." error. 

I could not understand (in case this error happens) how expectedTLIs list
had 3 at its head (for example, when XLogFileReadAnyTLI used it as we
observed) whereas at other times (when this error does not happen), it has 2
at its head until all the segments of timeline 2 are recovered from?
Also, how does recycled segment gets timeline 3 whereas 2 is expected in
this case?

Is this the right way to look at the problem and its possible fix?

I have tried going through the source regarding this but have not been able
to determine where this could accidentally happen, partly because I do not
understand recovery process (and its code) very well. Will post if find
anything useful. 

regards,
Amit Langote





--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Failing-start-up-archive-recovery-at-Standby-mode-in-PG9-2-4-tp5753110p5753221.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Redundancy in comment within lock.c

2013-04-24 Thread Peter Geoghegan
Silly typo report. Line 3774 of lmgr's lock.c says:

 * Re-acquire a lock belonging to a transaction that was prepared, when
 * when starting up into hot standby mode.

This has been the case since the original hot standby commit.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers