Re: [HACKERS] Enabling Checksums
On 24 April 2013 01:10, Jeff Davis wrote: > On Tue, 2013-04-23 at 16:28 +0100, Simon Riggs wrote: >> * make the pg_control.data_checksums field into a version number, for >> future flexibility... >> patch attached > > Commenting on this separately because it's a separate issue. > > I'd prefer that it was some kind of a checksum ID code -- e.g. 0 for no > checksum, 1 for FNV-1a-SR3, etc. That would allow us to release 9.4 with > a new algorithm without forcing existing users to change. That's exactly what the patch does. > initdb would have to take the code as an option, probably in string > form. When/if we have multiple options we can add that. The main thing was to make sure the control file recorded things in a common way. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fast promotion, loose ends
On 22.04.2013 18:44, Simon Riggs wrote: On 22 April 2013 09:29, Heikki Linnakangas wrote: Hmm. That requires write access to $DATADIR, so that's not quite the same thing as the trigger_file recovery.conf option. Well, you also (elsewhere) requested that we must keep recovery.conf in $DATADIR, so it needs to be writable. That's a slightly different requirement. $DATADIR must be writable by the process that restores the backup or puts the server into standby mode, while trigger_file needs to be writable by the process that triggers failover. Those are not necessarily the same thing. I'm thinking of a heartbeat process that triggers failover by creating a file on an NFS server or similar. Possibly the same location where the WAL archive is located. $DATADIR would be stored elsewhere. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fast promotion, loose ends
On 24 April 2013 08:23, Heikki Linnakangas wrote: > On 22.04.2013 18:44, Simon Riggs wrote: >> >> On 22 April 2013 09:29, Heikki Linnakangas >> wrote: >> >>> Hmm. That requires write access to $DATADIR, so that's not quite the same >>> thing as the trigger_file recovery.conf option. >> >> >> Well, you also (elsewhere) requested that we must keep recovery.conf >> in $DATADIR, so it needs to be writable. > > > That's a slightly different requirement. $DATADIR must be writable by the > process that restores the backup or puts the server into standby mode, while > trigger_file needs to be writable by the process that triggers failover. > Those are not necessarily the same thing. I'm thinking of a heartbeat > process that triggers failover by creating a file on an NFS server or > similar. Possibly the same location where the WAL archive is located. > $DATADIR would be stored elsewhere. The default you've requested is fast promotion and I've agreed to that. The ability to write a file called "promote" to $DATADIR is there as a protection in case we need it in the field, its not going to be the primary mechanism any more. So if you're not intending to use it ever, it doesn't seem worth discussing the fact you don't like its location. But if you do want to discuss it, I think it's unreasonable of you to demand recovery.conf cannot be outside $DATADIR and then also demand related files be relocatable outside $DATADIR as well, the exact opposite. We had the chance to avoid making $DATADIR writable externally and that's gone now, at least for now. Here's the patch I was intending to apply. Please let me know if you have comments. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services set_fast_promotion_as_default.v1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fast promotion, loose ends
On 24.04.2013 10:57, Simon Riggs wrote: On 24 April 2013 08:23, Heikki Linnakangas wrote: On 22.04.2013 18:44, Simon Riggs wrote: On 22 April 2013 09:29, Heikki Linnakangas wrote: Hmm. That requires write access to $DATADIR, so that's not quite the same thing as the trigger_file recovery.conf option. Well, you also (elsewhere) requested that we must keep recovery.conf in $DATADIR, so it needs to be writable. That's a slightly different requirement. $DATADIR must be writable by the process that restores the backup or puts the server into standby mode, while trigger_file needs to be writable by the process that triggers failover. Those are not necessarily the same thing. I'm thinking of a heartbeat process that triggers failover by creating a file on an NFS server or similar. Possibly the same location where the WAL archive is located. $DATADIR would be stored elsewhere. The default you've requested is fast promotion and I've agreed to that. The ability to write a file called "promote" to $DATADIR is there as a protection in case we need it in the field, its not going to be the primary mechanism any more. So if you're not intending to use it ever, it doesn't seem worth discussing the fact you don't like its location. Ok, works for me. But if you do want to discuss it, I think it's unreasonable of you to demand recovery.conf cannot be outside $DATADIR and then also demand related files be relocatable outside $DATADIR as well, the exact opposite. We had the chance to avoid making $DATADIR writable externally and that's gone now, at least for now. As I said above, it's a different situation. recovery.conf has always been in $DATADIR, and it's always been possible to point trigger_file elsewhere, and you've always gotten full functionality using the trigger_file. I just want to maintain that status quo. Which your patch achieves, so I'm happy that that. Here's the patch I was intending to apply. Please let me know if you have comments. Regarding the change in pg_ctl: /* -* Use two different kinds of promotion file so we can understand -* the difference between smart and fast promotion. +* For 9.3 onwards, use fast promotion as the default option. +* Promotion with a full checkpoint is still possible by writing +* a file called "promote", e.g. +* snprintf(promote_file, MAXPGPATH, "%s/promote", pg_data); */ - if (shutdown_mode >= FAST_MODE) - snprintf(promote_file, MAXPGPATH, "%s/fast_promote", pg_data); - else - snprintf(promote_file, MAXPGPATH, "%s/promote", pg_data); + snprintf(promote_file, MAXPGPATH, "%s/fast_promote", pg_data); Should there be a version check there? I guess we've never guaranteed a newer pg_ctl to work with an older server version, but it seems likely that someone would try to do that, especially with "pg_ctl promote". With the above change, creating $DATADIR/fast_promote in a 9.2 server's data dir will do nothing. I'd suggest that we keep the filename unchanged, "promote", and only change the behavior in the server side, so that it performs fast promotion. If you want to have a "slow" promote file, we can call that "slow_promote" or "checkpoint_then_promote" or something. Aside from that, I assume you'll clean up the now-dead -m fast option etc. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fast promotion, loose ends
On 24 April 2013 09:10, Heikki Linnakangas wrote: > Regarding the change in pg_ctl: > >> /* >> -* Use two different kinds of promotion file so we can understand >> -* the difference between smart and fast promotion. >> +* For 9.3 onwards, use fast promotion as the default option. >> +* Promotion with a full checkpoint is still possible by writing >> +* a file called "promote", e.g. >> +* snprintf(promote_file, MAXPGPATH, "%s/promote", pg_data); >> */ >> - if (shutdown_mode >= FAST_MODE) >> - snprintf(promote_file, MAXPGPATH, "%s/fast_promote", >> pg_data); >> - else >> - snprintf(promote_file, MAXPGPATH, "%s/promote", pg_data); >> + snprintf(promote_file, MAXPGPATH, "%s/fast_promote", pg_data); > > > Should there be a version check there? I guess we've never guaranteed a > newer pg_ctl to work with an older server version, but it seems likely that > someone would try to do that, especially with "pg_ctl promote". With the > above change, creating $DATADIR/fast_promote in a 9.2 server's data dir will > do nothing. I'd suggest that we keep the filename unchanged, "promote", and > only change the behavior in the server side, so that it performs fast > promotion. If you want to have a "slow" promote file, we can call that > "slow_promote" or "checkpoint_then_promote" or something. pg_ctl already checks versions, so I don't see the point. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fast promotion, loose ends
On 24.04.2013 11:23, Simon Riggs wrote: On 24 April 2013 09:10, Heikki Linnakangas wrote: Regarding the change in pg_ctl: /* -* Use two different kinds of promotion file so we can understand -* the difference between smart and fast promotion. +* For 9.3 onwards, use fast promotion as the default option. +* Promotion with a full checkpoint is still possible by writing +* a file called "promote", e.g. +* snprintf(promote_file, MAXPGPATH, "%s/promote", pg_data); */ - if (shutdown_mode>= FAST_MODE) - snprintf(promote_file, MAXPGPATH, "%s/fast_promote", pg_data); - else - snprintf(promote_file, MAXPGPATH, "%s/promote", pg_data); + snprintf(promote_file, MAXPGPATH, "%s/fast_promote", pg_data); Should there be a version check there? I guess we've never guaranteed a newer pg_ctl to work with an older server version, but it seems likely that someone would try to do that, especially with "pg_ctl promote". With the above change, creating $DATADIR/fast_promote in a 9.2 server's data dir will do nothing. I'd suggest that we keep the filename unchanged, "promote", and only change the behavior in the server side, so that it performs fast promotion. If you want to have a "slow" promote file, we can call that "slow_promote" or "checkpoint_then_promote" or something. pg_ctl already checks versions, so I don't see the point. The point is, if you do "pgsql93/bin/pg_ctl -D $92DATADIR promote", it will create fast_promote file and return success. But it won't actually promote the server. I think that's bad. If pg_ctl already has a check against that, fine, but I don't think it does. Please make sure you test that before applying. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fast promotion, loose ends
On 24 April 2013 09:32, Heikki Linnakangas wrote: >> pg_ctl already checks versions, so I don't see the point. > > > The point is, if you do "pgsql93/bin/pg_ctl -D $92DATADIR promote", it will > create fast_promote file and return success. But it won't actually promote > the server. I think that's bad. > > If pg_ctl already has a check against that, fine, but I don't think it does. > Please make sure you test that before applying. If it doesn't check, that is not the only thing that would be broken. The original commit of pg_ctl promote would also be broken. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fast promotion, loose ends
On 24.04.2013 11:46, Simon Riggs wrote: On 24 April 2013 09:32, Heikki Linnakangas wrote: pg_ctl already checks versions, so I don't see the point. The point is, if you do "pgsql93/bin/pg_ctl -D $92DATADIR promote", it will create fast_promote file and return success. But it won't actually promote the server. I think that's bad. If pg_ctl already has a check against that, fine, but I don't think it does. Please make sure you test that before applying. If it doesn't check, that is not the only thing that would be broken. The original commit of pg_ctl promote would also be broken. Yeah, it would've been good if the "pg_ctl promote" patch would've added a version check. Nevertheless, don't you think it would be good to avoid changing the filename of the "promote" file, so that we don't have any more such breakage? I don't see any advantage in changing it. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Failing start-up archive recovery at Standby mode in PG9.2.4
ello, On 2013-04-24 17:43:39 +0900, KONDO Mitsumasa wrote: > Hi, > > I find problem about failing start-up achive recovery at Standby mode in > PG9.2.4 streaming replication. > I test same problem in PG9.2.3. But it is not occerd... > > cp: cannot stat `../arc/00030013': そのようなファイルやディレクトリはありません > > [Standby] 2013-04-22 01:27:25 EDTLOG: 0: restored log file > > "00020013" from archive I can't read the error message here, but this looks suspicious. The recovery command seems to be returning success although it prints such an error message? > > cp: cannot stat `../arc/00030014': そのようなファイルやディレクトリはありません > [* This point is illegal work]> [Standby] 2013-04-22 01:27:25 EDTLOG: 0: > unexpected pageaddr 0/600 in log file 0, segment 20, offset 0 > > [Standby] 2013-04-22 01:27:25 EDTLOCATION: ValidXLOGHeader, xlog.c:4395 > > cp: cannot stat `../arc/00030014': そのようなファイルやディレクトリはありません > > cp: cannot stat `../arc/0004.history': そのようなファイルやディレクトリはありません > > [Master] 2013-04-22 01:27:25 EDTDEBUG: 0: received replication > > command: IDENTIFY_SYSTEM > > [Master] 2013-04-22 01:27:25 EDTLOCATION: HandleReplicationCommand, > > walsender.c:449 > > [Master] 2013-04-22 01:27:25 EDTDEBUG: 0: received replication > > command: START_REPLICATION 0/1400 > > [Master] 2013-04-22 01:27:25 EDTLOCATION: HandleReplicationCommand, > > walsender.c:449 > > [Master] 2013-04-22 01:27:25 EDTFATAL: 58P01: requested WAL segment > > 00030014 has already been removed > > [Master] 2013-04-22 01:27:25 EDTLOCATION: XLogRead, walsender.c:1023 > > [Standby] 2013-04-22 01:27:25 EDTLOG: 0: streaming replication > > successfully connected to primary > > [Standby] 2013-04-22 01:27:25 EDTLOCATION: libpqrcv_connect, > > libpqwalreceiver.c:171 > > [Standby] 2013-04-22 01:27:25 EDTFATAL: XX000: could not receive data from > > WAL stream: FATAL: requested WAL segment 00030014 has > > already been removed > > > > [Standby] 2013-04-22 01:27:25 EDTLOCATION: libpqrcv_receive, > > libpqwalreceiver.c:389 > > > > [mitsu-ko@localhost postgresql-9.2.4]$ bincp: cannot stat > > `../arc/00030014': そのようなファイルやディレクトリはありません > > [Standby] 2013-04-22 01:27:30 EDTDEBUG: 0: unexpected pageaddr > > 0/600 in log file 0, segment 20, offset 0 > > [Standby] 2013-04-22 01:27:30 EDTLOCATION: ValidXLOGHeader, xlog.c:4395 > > cp: cannot stat `../arc/00030014': そのようなファイルやディレクトリはありません > > cp: cannot stat `../arc/0004.history': そのようなファイルやディレクトリはありません > > [Master] 2013-04-22 01:27:30 EDTDEBUG: 0: received replication > > command: IDENTIFY_SYSTEM > > [Master] 2013-04-22 01:27:30 EDTLOCATION: HandleReplicationCommand, > > walsender.c:449 > > [Master] 2013-04-22 01:27:30 EDTDEBUG: 0: received replication > > command: START_REPLICATION 0/1400 > > [Master] 2013-04-22 01:27:30 EDTLOCATION: HandleReplicationCommand, > > walsender.c:449 > > [Master] 2013-04-22 01:27:30 EDTFATAL: 58P01: requested WAL segment > > 00030014 has already been removed > > [Master] 2013-04-22 01:27:30 EDTLOCATION: XLogRead, walsender.c:1023 > > [Standby] 2013-04-22 01:27:30 EDTLOG: 0: streaming replication > > successfully connected to primary > > [Standby] 2013-04-22 01:27:30 EDTLOCATION: libpqrcv_connect, > > libpqwalreceiver.c:171 > > [Standby] 2013-04-22 01:27:30 EDTFATAL: XX000: could not receive data from > > WAL stream: FATAL: requested WAL segment 00030014 has > > already been removed And this seems to be the second problem. You probably need to configure a higher wal_keep_segments on the primary or you need to fix your recovery_command. > We try to fix this problem with my company's colleaguea, > but it is too difficult and complicated to WAL contorol and validation > method... > > This problem is occerd in xlog.c:10692. > > 10692 if (!ValidXLOGHeader((XLogPageHeader) readBuf, emode, false)) > > 10693 goto next_record_is_invalid; > > And problem is occerd this checking in ValidXLOGHeader(). > > 4390 if (!XLByteEQ(hdr->xlp_pageaddr, recaddr)) > > 4391 { > > 4392 ereport(emode_for_corrupt_record(emode, recaddr), > > 4393 (errmsg("unexpected pageaddr %X/%X in > > log file %u, segment %u, offset %u", > > 4394 > > hdr->xlp_pageaddr.xlogid, hdr->xlp_pageaddr.xrecoff, > > 4395 readId, readSeg, > > readOff))); > > 4396 return false; > > 4397 } > > We think hdr->xlp_pageaddr.xrecoff has wrong value. This is very rare cace! > We cannot analyze why this value is uncorrect! I think the tests is just noticing that we don't have valid WAL - we are probably reading old preallocated wal files from before starting the node as a standby. Norma
Re: [HACKERS] Fast promotion, loose ends
On 24 April 2013 09:53, Heikki Linnakangas wrote: > On 24.04.2013 11:46, Simon Riggs wrote: >> >> On 24 April 2013 09:32, Heikki Linnakangas >> wrote: >> pg_ctl already checks versions, so I don't see the point. >>> >>> >>> >>> The point is, if you do "pgsql93/bin/pg_ctl -D $92DATADIR promote", it >>> will >>> create fast_promote file and return success. But it won't actually >>> promote >>> the server. I think that's bad. >>> >>> If pg_ctl already has a check against that, fine, but I don't think it >>> does. >>> Please make sure you test that before applying. >> >> >> If it doesn't check, that is not the only thing that would be broken. >> The original commit of pg_ctl promote would also be broken. > > > Yeah, it would've been good if the "pg_ctl promote" patch would've added a > version check. Nevertheless, don't you think it would be good to avoid > changing the filename of the "promote" file, so that we don't have any more > such breakage? I don't see any advantage in changing it. Apart from all the reasons I already gave, no. The filename isn't changing. We are adding a new capability and changing the default. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Failing start-up archive recovery at Standby mode in PG9.2.4
Hello, > > cp: cannot stat `../arc/00030013': そのようなファイルやディレクトリはありません > > [Standby] 2013-04-22 01:27:25 EDTLOG: 0: restored log file > > "00020013" from archive > I can't read the error message here, but this looks suspicious. The error message is "No such file or directory" in Japanese :-). Perhaps restore_command is 'cp ../arc/%f %p'. > The recovery command seems to be returning success although it prints such an error message? It seems that, on 9.2.4, RestoreArchivedFile returns expected xlog filename even if the restore command ends up in failure. And the caller XLogFileOpen successfully ets and returns fd for the filename if the file exists since before the restoration attempt. It does not matter when the existing file and the file expected to be restored are identical in content. If the segment includes the timeline increment point, something might happen, I suppose.. regards, -- Kyotaro Horiguchi -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Failing start-up archive recovery at Standby mode in PG9.2.4
Sorry, > caller XLogFileOpen successfully ets and returns fd for the filename The caller is XLogFileRead in this case. # and 'ets' is gets, of course. regards, -- Kyotaro Horiguchi -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Failing start-up archive recovery at Standby mode in PG9.2.4
On 2013-04-24 19:16:12 +0900, Kyotaro HORIGUCHI wrote: > Hello, > > > > cp: cannot stat `../arc/00030013': そのようなファイルやディレクトリはありません > > > [Standby] 2013-04-22 01:27:25 EDTLOG: 0: restored log file > > > "00020013" from archive > > > I can't read the error message here, but this looks suspicious. > > The error message is "No such file or directory" in Japanese :-). > Perhaps restore_command is 'cp ../arc/%f %p'. > > > The recovery command seems to be returning success although it prints such > an error message? > > It seems that, on 9.2.4, RestoreArchivedFile returns expected xlog > filename even if the restore command ends up in failure. And the > caller XLogFileOpen successfully ets and returns fd for the filename > if the file exists since before the restoration attempt. It does not > matter when the existing file and the file expected to be restored are > identical in content. But thats not what happening here, afaics the "restore log file ..." message is only printed if the returncode is 0. And the return value from RestoreArchivedFile() would be different in the case you describe which would make XLogFileRead() fail, so it should result in different errors. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Failing start-up archive recovery at Standby mode in PG9.2.4
Oops, > But thats not what happening here, afaics the "restore log file ..." > message is only printed if the returncode is 0. You're right. 'cp ' exits with the status code 1 (or 256?). The quoted log lines simply show that segment for tli=3 did not exist and that for tli=2 had been gotten finally. It's quite normal and irrelevant to the trouble mentioned. Sorry for the confusion. Unfortunately, the script attached didn't reproduce the situation for me. But in the log attached, > [Standby] 2013-04-22 01:27:25 EDTFATAL: XX000: could not receive data from > WAL stream: FATAL: requested WAL segment 00030014 has > already been removed > cp: cannot stat `../arc/00030014': No such file or directory. > [Standby] 2013-04-22 01:27:30 EDTDEBUG: 0: unexpected pageaddr 0/600 > in log file 0, segment 20, offset 0 seems showing that the standby received negative ack for the request for 20th segment, and 5 seconds later, it tried to get that from archive and somehow thought that it'd gotten but the header is of 6th segment. regards, -- Kyotaro Horiguchi -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSOC Student Project Idea
On Apr23, 2013, at 23:25 , Alexander Korotkov wrote: > I've taken a brief look on the paper and implementation. As I can see > iDistance implements some global building strategy. I mean, for example, it > selects some point, calculates distances from selected point to all points in > dataset etc. So, it uses the whole dataset at the same time. > > However you can try to implement global index building in GiST or SP-GiST. In > this case I think you should carefully estimate your capabilities during > single GSoC project. You would need to extend GiST or SP-GiST interface and > write completely new implementation of tree building algorithm. Question of > how to exactly extend GiST or SP-GiST interface for this case could appear to > be very hard even theoretically. +1. That seemed to be a major roadblock to me too when I read the paper. You could work around that by making partition identification a separate step. You'd have a function idist_analyze(cfg name, table name, field name) which'd identify suitable partitions for the data distribution in table.field and store them somewhere. Such a set of pre-identified partitions would be akin to a tsearch configuration, i.e. all other parts of the iDistance machinery would use it to map points to index keys and queries to ranges of those keys. You'll want to look at how tsearch handles that, and check if the method can indeed be applied to iDistance. In a first cut, you'd probably only allow inserts into index which don't change the maximum distances from the partition centers that idist_analyze() found. With that restriction in place, you might get away without GiST or SP-GiSt, and simply use postgres' standard btree, if you find a way to map queries of the form "field idistance_knn_function(point, distance)" to "where (idstance_keys(field) between P1_lowerbound AND P2_upperbound) OR (idistance_keys(field) between P2_lowerbuild AND P2_upperbound) …" or something equivalent. Or you could start with a GiST-based btree and extend it to support "distance" function. Looking at contrib/btree_gist and the built-in GiST operator class for point types should help. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal to add --single-row to psql
On Tue, Apr 23, 2013 at 1:05 PM, Tom Lane wrote: > > Isn't there already a way to set FETCH_COUNT from the command line? > (ie, I think there's a generic variable-assignment facility that could > do this) > Christopher, Tom is all right... from psql [1] command line we can do that: $ bin/psql --variable=FETCH_COUNT=100 psql (9.3devel) Type "help" for help. fabrizio=# \echo :FETCH_COUNT 100 Regards, [1] http://www.postgresql.org/docs/current/interactive/app-psql.html#AEN84903 -- Fabrízio de Royes Mello Consultoria/Coaching PostgreSQL >> Blog sobre TI: http://fabriziomello.blogspot.com >> Perfil Linkedin: http://br.linkedin.com/in/fabriziomello >> Twitter: http://twitter.com/fabriziomello
Re: [HACKERS] REFRESH MATERIALIZED VIEW command in PL block hitting Assert
On Wed, Apr 24, 2013 at 3:04 AM, Kevin Grittner wrote: > Jeevan Chalke wrote: > > On Mon, Apr 22, 2013 at 6:41 PM, Andres Freund > wrote: > >> On 2013-04-22 18:35:04 +0530, Jeevan Chalke wrote: > > >>> I have observed that following sequence is causing server crash. > >>> > >>> CREATE MATERIALIZED VIEW temp_class_mv AS > >>> SELECT * FROM pg_class > >>> WITH NO DATA; > >>> > >>> CREATE OR REPLACE FUNCTION test_refresh_mv() > >>> RETURNS int > >>> AS $$ > >>> BEGIN > >>> REFRESH MATERIALIZED VIEW temp_class_mv; > >>> return 1; > >>> END; $$ LANGUAGE plpgsql; > >>> > >>> SELECT test_refresh_mv(); > >>> > >>> > >>> I had a quick look over the crash and it is hitting following Assert in > >>> spi.c: > >>> > >>> else if (IsA(stmt, RefreshMatViewStmt)) > >>> { > >>> Assert(strncmp(completionTag, > >>>"REFRESH MATERIALIZED VIEW ", 23) > == 0); > >>> _SPI_current->processed = strtoul(completionTag + > 23, > >>> NULL, 10); > >>> } > >>> > >>> It seems like we are missing expected value for completionTag in > >>> ExecRefreshMatView() > > >> Possibly independent from this issue, but where did that 23 come from? > > When the consensus developed to change the syntax from LOAD > MATERIALIZED VIEW I failed to noticed the length here when making > the changes for that. > > > BTW, attached is the patch which works well for me, but need details > review. > > I suggest that we just rip out this section of code. Trying to > provide a number here is probably all wrong, anyway. As the > features evolve, there may not be a readily accessible rowcount for > this command in all cases. > OK. > > Any objections to the attached to fix this issue? > Nope. Fine with me. Thanks > > -- > Kevin Grittner > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company > -- Jeevan B Chalke Senior Software Engineer, R&D EnterpriseDB Corporation The Enterprise PostgreSQL Company Phone: +91 20 30589500 Website: www.enterprisedb.com EnterpriseDB Blog: http://blogs.enterprisedb.com/ Follow us on Twitter: http://www.twitter.com/enterprisedb This e-mail message (and any attachment) is intended for the use of the individual or entity to whom it is addressed. This message contains information from EnterpriseDB Corporation that may be privileged, confidential, or exempt from disclosure under applicable law. If you are not the intended recipient or authorized to receive this for the intended recipient, any use, dissemination, distribution, retention, archiving, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender immediately by reply e-mail and delete this message.
Re: [HACKERS] high io BUT huge amount of free memory
On Tue, Apr 23, 2013 at 10:50 AM, Shaun Thomas wrote: > This is most likely a NUMA issue. There really seems to be some kind of > horrible flaw in the Linux kernel when it comes to properly handling NUMA on > large memory systems. Are you referring to the fact that vm.zone_reclaim_mode = 1 is an idiotic default? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] missing time.h include in psql/command.c since the addition of \watch
Hi, our internal testbuilds show a new warning on windows: src\bin\psql\command.c(2617): warning C4013: 'time' undefined; assuming extern returning int [C:\jenkins\workspace\andres_git.postgresql.org_windows\BT\release\SL_OS\windows\TA\x86\TO\xp\psql.vcxproj] src\bin\psql\command.c(2619): warning C4013: 'asctime' undefined; assuming extern returning int [C:\jenkins\workspace\andres_git.postgresql.org_windows\BT\release\SL_OS\windows\TA\x86\TO\xp\psql.vcxproj] src\bin\psql\command.c(2619): warning C4013: 'localtime' undefined; assuming extern returning int [C:\jenkins\workspace\andres_git.postgresql.org_windows\BT\release\SL_OS\windows\TA\x86\TO\xp\psql.vcxproj] Looking at command.c its neither surprising that we get that warning, nor that we don't get it on non-windows builds. Those get time.h via portability/instr_time.h. So it seems we should add a time.h include to command.c. Trivial patch for that attached. Not sure why that warning isn't visible on the buildfarm though... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c index 09939fd..2e74284 100644 --- a/src/bin/psql/command.c +++ b/src/bin/psql/command.c @@ -13,6 +13,8 @@ #endif #include +#include + #ifdef HAVE_PWD_H #include #endif -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.3 release notes suggestions
On Tue, Apr 23, 2013 at 11:41 PM, Bruce Momjian wrote: > Thanks for the many suggestions on improving the 9.3 release notes. > There were many ideas I would have never thought of. Please keep the > suggestions coming. Bruce, Thanks for writing them! -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] high io BUT huge amount of free memory
On 04/24/2013 08:24 AM, Robert Haas wrote: Are you referring to the fact that vm.zone_reclaim_mode = 1 is an idiotic default? Well... it is. But even on systems where it's not the default or is explicitly disabled, there's just something hideously wrong with NUMA in general. Take a look at our numa distribution on a heavily loaded system: available: 2 nodes (0-1) node 0 cpus: 0 2 4 6 8 10 12 14 16 18 20 22 node 0 size: 36853 MB node 0 free: 14315 MB node 1 cpus: 1 3 5 7 9 11 13 15 17 19 21 23 node 1 size: 36863 MB node 1 free: 300 MB node distances: node 0 1 0: 10 20 1: 20 10 What the hell? Seriously? Using numactl and starting in interleave didn't fix this, either. It just... arbitrarily ignores a huge chunk of memory for no discernible reason. The memory pressure code in Linux is extremely fucked up. I can't find it right now, but the memory management algorithm makes some pretty ridiculous assumptions once you pass half memory usage, regarding what is in active and inactive cache. I hate to rant, but it gets clearer to me every day that Linux is optimized for desktop systems, and generally only kinda works for servers. Once you start throwing vast amounts of memory, CPU, and processes at it though, things start to get unpredictable. That all goes back to my earlier threads that disabling process autogrouping via the kernel.sched_autogroup_enabled setting, magically gave us 20-30% better performance. The optimal setting for a server is clearly to disable process autogrouping, and yet it's enabled by default, and strongly advocated by Linus himself as a vast improvement. I get it. It's better for desktop systems. But the LAMP stack alone has probably a couple orders of magnitude more use cases than Joe Blow's Pentium 4 in his basement. Yet it's the latter case that's optimized for. Servers are getting shafted in a lot of cases, and it's actually starting to make me angry. -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604 312-676-8870 stho...@optionshouse.com __ See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] REFRESH MATERIALIZED VIEW command in PL block hitting Assert
Jeevan Chalke wrote: > On Wed, Apr 24, 2013 at 3:04 AM, Kevin Grittner wrote: >> Any objections to the attached to fix this issue? > Nope. Fine with me. Pushed. Thanks for the report! -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] high io BUT huge amount of free memory
On 2013-04-24 08:39:09 -0500, Shaun Thomas wrote: > The memory pressure code in Linux is extremely fucked up. I can't find it > right now, but the memory management algorithm makes some pretty ridiculous > assumptions once you pass half memory usage, regarding what is in active and > inactive cache. > > I hate to rant, but it gets clearer to me every day that Linux is optimized > for desktop systems, and generally only kinda works for servers. Once you > start throwing vast amounts of memory, CPU, and processes at it though, > things start to get unpredictable. > That all goes back to my earlier threads that disabling process autogrouping > via the kernel.sched_autogroup_enabled setting, magically gave us 20-30% > better performance. The optimal setting for a server is clearly to disable > process autogrouping, and yet it's enabled by default, and strongly > advocated by Linus himself as a vast improvement. > I get it. It's better for desktop systems. But the LAMP stack alone has > probably a couple orders of magnitude more use cases than Joe Blow's Pentium > 4 in his basement. Yet it's the latter case that's optimized for. IIRC there have been some scalability improvements to that code. > Servers are getting shafted in a lot of cases, and it's actually starting to > make me angry. Uh. Ranting can be rather healthy thing every now and then and it good for the soul and such. But. Did you actually try reporting those issues? In my experience the mm and scheduler folks are rather helpful if they see you're actually interested in fixing a problem. I have seen rants about this topic on various pg lists for the last months but I can't remember seeing mails on lkml about it. How should they fix what they don't know about? You know, before Robert got access to the bigger machine we *also* had some very bad behaviour on them. And our writeout mechanism/buffer acquisition mechanism still utterly sucks there. But that doesn't mean we don't care. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] high io BUT huge amount of free memory
On 04/24/2013 08:49 AM, Andres Freund wrote: Uh. Ranting can be rather healthy thing every now and then and it good for the soul and such. But. Did you actually try reporting those issues? That's actually part of the problem. How do you report: Throwing a lot of processes at a high-memory system seems to break the mm code in horrible ways. I'm asking seriously here, because I have no clue how to isolate this behavior. It's clearly happening often enough that random people are starting to notice now that bigger servers are becoming the norm. I'm also not a kernel dev in any sense of the word. My C is so rusty, I can barely even read the patches going through the ML. I feel comfortable posting to PG lists because that's my bread and butter. Kernel lists seem way more imposing, and I'm probably not the only one who feels that way. I guess I don't mean to imply that kernel devs don't care. Maybe the right way to put it is that there don't seem to be enough kernel devs being provided with more capable testing hardware. Which is odd, considering Red Hat's involvement and activity on the kernel. I apologize, though. These last few months have been really frustrating thanks to this and other odd kernel-related issues. We've reached an equilibrium where the occasional waste of 20GB of RAM doesn't completely cripple the system, but this thread kinda struck a sore point. :) -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604 312-676-8870 stho...@optionshouse.com __ See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] high io BUT huge amount of free memory
On 2013-04-24 09:06:39 -0500, Shaun Thomas wrote: > On 04/24/2013 08:49 AM, Andres Freund wrote: > > >Uh. Ranting can be rather healthy thing every now and then and it good > >for the soul and such. But. Did you actually try reporting those issues? > > That's actually part of the problem. How do you report: > > Throwing a lot of processes at a high-memory system seems to break the mm > code in horrible ways. Well. Report memory distribution. Report perf profiles. Ask *them* what information they need. They aren't grumpy if you are behaving sensibly. YMMV of course. > I'm asking seriously here, because I have no clue how to isolate this > behavior. It's clearly happening often enough that random people are > starting to notice now that bigger servers are becoming the norm. > > I'm also not a kernel dev in any sense of the word. My C is so rusty, I can > barely even read the patches going through the ML. I feel comfortable > posting to PG lists because that's my bread and butter. Kernel lists seem > way more imposing, and I'm probably not the only one who feels that way. I can understand that. But you had to jump over the fence to post here once as well ;). Really, report it and see what comes out. The worst that can happen is that you get a grumpy email ;) And in the end, jumping might ease the pain in the long run considerably even if its uncomfortable at first... Feel free to CC me. > I guess I don't mean to imply that kernel devs don't care. Maybe the right > way to put it is that there don't seem to be enough kernel devs being > provided with more capable testing hardware. Which is odd, considering Red > Hat's involvement and activity on the kernel. There are quite some people using huge servers, but that doesn't imply they are seing the same problems. During testing they mostly use a set of a few benchmarks (part of which is pgbench btw) and apparently they don't show this problem. Also this is horribly workload and hardware dependent. There are enough people happily using postgres on linux on far bigger hardware than what you reported upthread. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Allowing parallel pg_restore from pipe
Hi All, Currently the -j option to pg_restore, which allows for parallelization in the restore, can only be used if the input file is a regular file and not, for ex., a pipe. However this is a pretty common occurrence for us (usually in the form of pg_dump | pg_restore to copy an individual database or some tables thereof from one machine to another). While there's no good way to parallelize the data load steps when reading from a pipe, the index and constraint building can still be parallelized and as they are generally CPU bound on our machines we've found quite a bit of speedup from doing so. Attached is two diffs off of the REL9_2_4 tag that I've been using. The first is a simple change that serially loads the data section before handing off the remainder of the restore to the existing parallelized restore code (the .ALT. diff). The second which gets more parallelization but is a bit more of a change uses the existing dependency analysis code to allow index building etc. to occur in parallel with data loading. The data loading tasks are still performed serially in the main thread, but non-data loading tasks are scheduled in parallel as their dependencies are satisfied (with the caveat that the main thread can only dispatch new tasks between data loads). Anyways, the question is if people think this is generally useful. If so I can clean up the preferred choice a bit and rebase it off of master, etc. Tim 0003-patch-pg_restore-to-allow-parallel-restore-when-the.ALT.patch Description: Binary data 0003-patch-pg_restore-to-allow-parallel-restore-when-the-.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] missing time.h include in psql/command.c since the addition of \watch
On 24.04.2013 16:27, Andres Freund wrote: Hi, our internal testbuilds show a new warning on windows: src\bin\psql\command.c(2617): warning C4013: 'time' undefined; assuming extern returning int [C:\jenkins\workspace\andres_git.postgresql.org_windows\BT\release\SL_OS\windows\TA\x86\TO\xp\psql.vcxproj] src\bin\psql\command.c(2619): warning C4013: 'asctime' undefined; assuming extern returning int [C:\jenkins\workspace\andres_git.postgresql.org_windows\BT\release\SL_OS\windows\TA\x86\TO\xp\psql.vcxproj] src\bin\psql\command.c(2619): warning C4013: 'localtime' undefined; assuming extern returning int [C:\jenkins\workspace\andres_git.postgresql.org_windows\BT\release\SL_OS\windows\TA\x86\TO\xp\psql.vcxproj] Looking at command.c its neither surprising that we get that warning, nor that we don't get it on non-windows builds. Those get time.h via portability/instr_time.h. So it seems we should add a time.h include to command.c. Trivial patch for that attached. Thanks, applied. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] putting a bgworker to rest
Andres Freund writes: >> How would postmaster know when to restart a worker that stopped? > > I had imagined we would assign some return codes special > meaning. Currently 0 basically means "restart immediately", 1 means > "crashed, wait for some time", everything else results in a postmaster > restart. It seems we can just assign returncode 2 as "done", probably > with some enum or such hiding the numbers. In Erlang, the lib that cares about such things in called OTP, and that proposes a model of supervisor that knows when to restart a worker. The specs for the restart behaviour are: Restart = permanent | transient | temporary Restart defines when a terminated child process should be restarted. - A permanent child process is always restarted. - A temporary child process is never restarted (not even when the supervisor's restart strategy is rest_for_one or one_for_all and a sibling's death causes the temporary process to be terminated). - A transient child process is restarted only if it terminates abnormally, i.e. with another exit reason than normal, shutdown or {shutdown,Term}. Then about restart frequency, what they have is: The supervisors have a built-in mechanism to limit the number of restarts which can occur in a given time interval. This is determined by the values of the two parameters MaxR and MaxT in the start specification returned by the callback function [ ... ] If more than MaxR number of restarts occur in the last MaxT seconds, then the supervisor terminates all the child processes and then itself. You can read the whole thing here: http://www.erlang.org/doc/design_principles/sup_princ.html#id71215 I think we should get some inspiration from them here. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.3 Beta1 status report
On 24.04.2013 06:22, Bruce Momjian wrote: On Tue, Apr 23, 2013 at 06:56:34PM -0300, Alvaro Herrera wrote: Bruce Momjian wrote: On Tue, Apr 23, 2013 at 05:04:15PM -0400, Bruce Momjian wrote: Do we usually repeat the changes listed in the backwards compatibility section later, in the "Changes" section? If not, then instead of the first two items above, let's just have these in the backwards-compatibility section: We do not repeat the incompatibile items later in release notes. I have added some of your text to one of the items, and added a mention that tooling will need adjustment. Please check the post-commit version and let me know about adjustments. Let me clarify --- changes to our WAL binary format and source code changes are not really incompatibilities from a user perspective as we never promise to do our best to minimize such changes --- m eaning the fact the WAL format changed is something that would be only in the source code section and not in the "incompatibilities section" --- incompatibilities are related to change in user experience or documented-API changes. These guidelines makes sense, except I think the change in naming standard of xlog segments is important to document clearly because, even if we have not promised compatibility, people could be relying on it in scripts. I think it makes sense to waste a couple of lines documenting this change, even if we expect the number of people to be bitten by it to be very low. Right. Kevin mentioned he had a script that knew about the numbering: http://www.postgresql.org/message-id/4fd09b5e022500048...@gw.wicourts.gov. Agreed. Here is the new text: Store WAL in a continuous stream, rather than skipping the last 16MB segment every 4GB (Heikki Linnakangas) BACKWARD COMPATIBLE BREAK Previously, WAL files ending in FF were not used. If you have WAL backup or restore scripts that took that skipping into account, they need to be adjusted. Looks good, thanks! - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] putting a bgworker to rest
On Tue, Apr 23, 2013 at 1:22 PM, Andres Freund wrote: >> So a "done" worker would never be restarted, until postmaster sees a >> crash or is itself restarted? I guess that'd be useful for workers >> running during recovery, which terminate when recovery completes. Is >> that your use case? > > Well, its not actual postgres recovery, but something similar in the > context of logical replication. It's probably too late to be twiddling this very much more, but another thing I think would be useful is for backends to have the ability to request that the postmaster start a worker of type xyz, rather than having the server start it automatically at startup time. That's what you'd need for parallel query, and there might be some replication-related use cases for such things as well. The general usage pattern would be: - regular backend realizes that it needs help - kicks postmaster to start a helper process - helper process runs for a while, doing work - helper process finishes work, maybe waits around for some period of time to see if any new work arrives, and then exits - eventually go back to step 1 -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Examples Re: [HACKERS] Bug Fix: COLLATE with multiple ORDER BYs in aggregates
On Tue, Apr 23, 2013 at 09:57:27AM -0700, David Fetter wrote: > Folks, > > While testing the upcoming FILTER clause for aggregates, Erik Rijkers > uncovered a long-standing bug in $subject, namely that this case > wasn't handled. Please find attached a patch by Andrew Gierth and > myself which fixes this issue and adds a regression test to ensure it > remains fixed. Please see below results when I run the regression test query on git master, REL_9_2_STABLE, and REL9_1_STABLE, respectively: $ psql postgres psql (9.3devel) Type "help" for help. shackle@postgres:5493=# SELECT array_agg(a COLLATE "C" ORDER BY b COLLATE "POSIX") FROM (VALUES ('foo','bar')) v(a,b); ERROR: collation mismatch between explicit collations "C" and "POSIX" LINE 1: SELECT array_agg(a COLLATE "C" ORDER BY b COLLATE "POSIX") F... ^ $ psql postgres psql (9.2.4) Type "help" for help. shackle@postgres:5492=# SELECT array_agg(a COLLATE "C" ORDER BY b COLLATE "POSIX") FROM (VALUES ('foo','bar')) v(a,b); ERROR: collation mismatch between explicit collations "C" and "POSIX" LINE 1: SELECT array_agg(a COLLATE "C" ORDER BY b COLLATE "POSIX") F... ^ $ psql postgres psql (9.1.9) Type "help" for help. shackle@postgres:5491=# SELECT array_agg(a COLLATE "C" ORDER BY b COLLATE "POSIX") FROM (VALUES ('foo','bar')) v(a,b); ERROR: collation mismatch between explicit collations "C" and "POSIX" LINE 1: SELECT array_agg(a COLLATE "C" ORDER BY b COLLATE "POSIX") F... ^ Cheers, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re[2]: [HACKERS] high io BUT huge amount of free memory
thanks a lot for responses 1) just remind my case Intel 32 core = 2*8 *2threads Linux 2.6.32-5-amd64 #1 SMP Sun May 6 04:00:17 UTC 2012 x86_64 GNU/Linux PostgreSQL 9.2.2 on x86_64-unknown-linux-gnu, compiled by gcc-4.4.real (Debian 4.4.5-8) 4.4.5, 64-bit shared_buffers 64GB / constant hit rate - 99,18 max_connections 160 / with pgbouncer pools there could not be more than 120 connections at all work_mem 32M checkpoint 1h 1.0 swap off numa off, interleaving on and ! disks usage 100% (free 128GB! WHY?) disk throughput - up-to 30MB/s (24r+6w) io - up-to 2,5-3K/s (0,5w + 2-2,5r) typical work load - pk-index-scans my warm work set is about 400GB db at all - 700GB 2) numactl mtyurin@avi-sql09:~$ numactl --hardware available: 1 nodes (0-0) node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 node 0 size: 393181 MB node 0 free: 146029 MB node distances: node 0 0: 10 3) !! i just found suspicious relation between "active" processes and free memory. ~1GB per process. in 376GB total memmory and 32 core if ( user cpu + io wait ) is ~140% then i have ~140GB free. but it could be just a coincidence. 4) now i think a) upgrade linux core (to 3.2!?) and then (if case still will be) b) set buffers to something like 300-320Gb 5) what do you know about workload in Berkus's case http://www.databasesoup.com/2012/04/red-hat-kernel-cache-clearing-issue.html ? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re[3]: [HACKERS] high io BUT huge amount of free memory
typo > if ( user cpu + io wait ) is ~140% then i have ~140GB free. 140% ===>> 1400% if ~14 cores are busy then ~140GB is free 10GB per process hmmm... -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [HACKERS] Re[3]: [HACKERS] high io BUT huge amount of free memory
vm state root@avi-sql09:~# /sbin/sysctl -a|grep vm vm.overcommit_memory = 0 vm.panic_on_oom = 0 vm.oom_kill_allocating_task = 0 vm.oom_dump_tasks = 0 vm.overcommit_ratio = 50 vm.page-cluster = 3 vm.dirty_background_ratio = 10 vm.dirty_background_bytes = 0 vm.dirty_ratio = 20 vm.dirty_bytes = 0 vm.dirty_writeback_centisecs = 500 vm.dirty_expire_centisecs = 3000 vm.nr_pdflush_threads = 0 vm.swappiness = 0 vm.nr_hugepages = 0 vm.hugetlb_shm_group = 0 vm.hugepages_treat_as_movable = 0 vm.nr_overcommit_hugepages = 0 vm.lowmem_reserve_ratio = 256 256 32 vm.drop_caches = 0 vm.min_free_kbytes = 65536 vm.percpu_pagelist_fraction = 0 vm.max_map_count = 65530 vm.laptop_mode = 0 vm.block_dump = 0 vm.vfs_cache_pressure = 100 vm.legacy_va_layout = 0 vm.zone_reclaim_mode = 0 vm.min_unmapped_ratio = 1 vm.min_slab_ratio = 5 vm.stat_interval = 1 vm.mmap_min_addr = 65536 vm.numa_zonelist_order = default vm.scan_unevictable_pages = 0 vm.memory_failure_early_kill = 0 vm.memory_failure_recovery = 1 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSOC Student Project Idea
On Wed, Apr 24, 2013 at 5:31 AM, Florian Pflug wrote: > On Apr23, 2013, at 23:25 , Alexander Korotkov > wrote: > > I've taken a brief look on the paper and implementation. As I can see > iDistance implements some global building strategy. I mean, for example, it > selects some point, calculates distances from selected point to all points > in dataset etc. So, it uses the whole dataset at the same time. > > > > However you can try to implement global index building in GiST or > SP-GiST. In this case I think you should carefully estimate your > capabilities during single GSoC project. You would need to extend GiST or > SP-GiST interface and write completely new implementation of tree building > algorithm. Question of how to exactly extend GiST or SP-GiST interface for > this case could appear to be very hard even theoretically. > > +1. That seemed to be a major roadblock to me too when I read the paper. > > You could work around that by making partition identification a separate > step. You'd have a function > > idist_analyze(cfg name, table name, field name) > > which'd identify suitable partitions for the data distribution in > table.field and store them somewhere. Such a set of pre-identified > partitions would be akin to a tsearch configuration, i.e. all other parts > of the iDistance machinery would use it to map points to index keys and > queries to ranges of those keys. You'll want to look at how tsearch handles > that, and check if the method can indeed be applied to iDistance. > > In a first cut, you'd probably only allow inserts into index which don't > change the maximum distances from the partition centers that > idist_analyze() found. With that restriction in place, you might get away > without GiST or SP-GiSt, and simply use postgres' standard btree, if you > find a way to map queries of the form "field > idistance_knn_function(point, distance)" to "where (idstance_keys(field) > between P1_lowerbound AND P2_upperbound) OR (idistance_keys(field) between > P2_lowerbuild AND P2_upperbound) …" or something equivalent. Thank you both for the very helpful feedback. Perhaps the scope of this project (application's "completeness criteria") is better as a feasibility prototyping of the global/distance-based index strategy with B+-tree and/or GiST extension possibilities. Let me briefly elaborate on the current implementation details related to your comments. The public C++ version uses a stand-alone STX B+-tree implementation, and has only been used in the research context of one-time data loading and indexing and then KNN retrieval performance efficiency. This requires an upfront global partition building (sequential look at all data points) and a bit of overhead information about the partitions (such as reference point locations and maximum distances in each). Of course, there are a lot of improvements, modifications, variants, etc., that can be built from this basic setup... but that's mostly my in-progress thesis work yet published or defended :) A given query is broken down into range(s) of key values in the B+-tree, based on the negligible overhead info kept from partitioning. Only then does this small subset of pages need to be loaded from disk. where the partitions are located with respect to the query. Therefore it is necessary to have left/right leaf pointers within the B+-tree. While I think a DBMS-integrated implementation would be more ideal for general use by everyone, my naive assumption is that I could probably implement the idistance functionality in stored procedures and metadata tables (at the cost of performance and usability). Best regards, Mike Schuh
Re: [HACKERS] Allowing parallel pg_restore from pipe
On 04/23/2013 07:53 PM, Timothy Garnett wrote: Hi All, Currently the -j option to pg_restore, which allows for parallelization in the restore, can only be used if the input file is a regular file and not, for ex., a pipe. However this is a pretty common occurrence for us (usually in the form of pg_dump | pg_restore to copy an individual database or some tables thereof from one machine to another). While there's no good way to parallelize the data load steps when reading from a pipe, the index and constraint building can still be parallelized and as they are generally CPU bound on our machines we've found quite a bit of speedup from doing so. Attached is two diffs off of the REL9_2_4 tag that I've been using. The first is a simple change that serially loads the data section before handing off the remainder of the restore to the existing parallelized restore code (the .ALT. diff). The second which gets more parallelization but is a bit more of a change uses the existing dependency analysis code to allow index building etc. to occur in parallel with data loading. The data loading tasks are still performed serially in the main thread, but non-data loading tasks are scheduled in parallel as their dependencies are satisfied (with the caveat that the main thread can only dispatch new tasks between data loads). Anyways, the question is if people think this is generally useful. If so I can clean up the preferred choice a bit and rebase it off of master, etc. I don't think these are bad ideas at all, and probably worth doing. Note that there are some fairly hefty changes affecting this code in master, so your rebasing could be tricky. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Allowing parallel pg_restore from pipe
Andrew Dunstan writes: > On 04/23/2013 07:53 PM, Timothy Garnett wrote: >> Anyways, the question is if people think this is generally useful. If so >> I can clean up the preferred choice a bit and rebase it off of master, >> etc. I find this idea very useful yes. Another idea would be to allow for parallel pg_dump output to somehow be piped into a parallel pg_restore. I don't know how to solve that at all, it just sound something worthy of doing too. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Allowing parallel pg_restore from pipe
On 04/24/2013 03:40 PM, Dimitri Fontaine wrote: Andrew Dunstan writes: On 04/23/2013 07:53 PM, Timothy Garnett wrote: Anyways, the question is if people think this is generally useful. If so I can clean up the preferred choice a bit and rebase it off of master, etc. I find this idea very useful yes. Another idea would be to allow for parallel pg_dump output to somehow be piped into a parallel pg_restore. I don't know how to solve that at all, it just sound something worthy of doing too. That's not going to work, the output from parallel pg_dump is inherently multiple streams. That's why it ONLY supports directory format, and not even custom format on disk, let alone a pipe. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Allowing parallel pg_restore from pipe
On 04/24/2013 03:49 PM, Andrew Dunstan wrote: On 04/24/2013 03:40 PM, Dimitri Fontaine wrote: Andrew Dunstan writes: On 04/23/2013 07:53 PM, Timothy Garnett wrote: Anyways, the question is if people think this is generally useful. If so I can clean up the preferred choice a bit and rebase it off of master, etc. I find this idea very useful yes. Another idea would be to allow for parallel pg_dump output to somehow be piped into a parallel pg_restore. I don't know how to solve that at all, it just sound something worthy of doing too. That's not going to work, the output from parallel pg_dump is inherently multiple streams. That's why it ONLY supports directory format, and not even custom format on disk, let alone a pipe. What might make sense is something like pg_dump_restore which would have no intermediate storage at all, just pump the data etc from one source to another in parallel. But I pity the poor guy who has to write it :-) cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Allowing parallel pg_restore from pipe
On 04/24/2013 09:51 PM, Andrew Dunstan wrote: > > On 04/24/2013 03:49 PM, Andrew Dunstan wrote: >> >> On 04/24/2013 03:40 PM, Dimitri Fontaine wrote: >>> Andrew Dunstan writes: On 04/23/2013 07:53 PM, Timothy Garnett wrote: > Anyways, the question is if people think this is generally useful. > If so > I can clean up the preferred choice a bit and rebase it off of master, > etc. >>> I find this idea very useful yes. >>> >>> Another idea would be to allow for parallel pg_dump output to somehow be >>> piped into a parallel pg_restore. I don't know how to solve that at all, >>> it just sound something worthy of doing too. >>> >> >> >> That's not going to work, the output from parallel pg_dump is >> inherently multiple streams. That's why it ONLY supports directory >> format, and not even custom format on disk, let alone a pipe. >> > > > What might make sense is something like pg_dump_restore which would have > no intermediate storage at all, just pump the data etc from one source > to another in parallel. But I pity the poor guy who has to write it :-) hmm pretty sure that Joachims initial patch for parallel dump actually had a PoC for something very similiar to that... Stefan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
On Wed, 2013-04-24 at 08:20 +0100, Simon Riggs wrote: > On 24 April 2013 01:10, Jeff Davis wrote: > > I'd prefer that it was some kind of a checksum ID code -- e.g. 0 for no > > checksum, 1 for FNV-1a-SR3, etc. That would allow us to release 9.4 with > > a new algorithm without forcing existing users to change. > > That's exactly what the patch does. The word "version" indicates an order to it though, like N+1 is always preferable to N. This is user-facing (through pg_controldata output), otherwise I wouldn't mind. > > initdb would have to take the code as an option, probably in string > > form. > > When/if we have multiple options we can add that. The main thing was > to make sure the control file recorded things in a common way. The main strange thing to me is that we're still using the enabled/disabled for the output of pg_controldata as well as the "version". When we do have multiple options, it seems like we'd just have one field output: Data page checksums: none|crc32c|pg-fnv What goal are you trying to accomplish with this patch? pg_control doesn't need to be compatible between releases, so can't we just add this later when we really do have multiple options? Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
On 24 April 2013 21:06, Jeff Davis wrote: > What goal are you trying to accomplish with this patch? That we might need to patch the checksum version on a production release. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Allowing parallel pg_restore from pipe
On Wed, Apr 24, 2013 at 4:05 PM, Stefan Kaltenbrunner < ste...@kaltenbrunner.cc> wrote: > > What might make sense is something like pg_dump_restore which would have > > no intermediate storage at all, just pump the data etc from one source > > to another in parallel. But I pity the poor guy who has to write it :-) > > hmm pretty sure that Joachims initial patch for parallel dump actually > had a PoC for something very similiar to that... That's right, I implemented that as an own output format and named it "migrator" I think, which wouldn't write each stream to a file as the directory output format does but that instead pumps it back into a restore client. Actually I think the logic was even reversed, it was a parallel restore that got the data from internally calling pg_dump functionality instead of from reading files... The neat thing about this approach was that the order was optimized and correct, i.e. largest tables start first and dependencies get resolved in the right order. I could revisit that patch for 9.4 if enough people are interested. Joachim
Re: [HACKERS] Allowing parallel pg_restore from pipe
On Wed, Apr 24, 2013 at 6:47 PM, Joachim Wieland wrote: > On Wed, Apr 24, 2013 at 4:05 PM, Stefan Kaltenbrunner > wrote: >> >> > What might make sense is something like pg_dump_restore which would have >> > no intermediate storage at all, just pump the data etc from one source >> > to another in parallel. But I pity the poor guy who has to write it :-) >> >> hmm pretty sure that Joachims initial patch for parallel dump actually >> had a PoC for something very similiar to that... > > > That's right, I implemented that as an own output format and named it > "migrator" I think, which wouldn't write each stream to a file as the > directory output format does but that instead pumps it back into a restore > client. > > Actually I think the logic was even reversed, it was a parallel restore that > got the data from internally calling pg_dump functionality instead of from > reading files... The neat thing about this approach was that the order was > optimized and correct, i.e. largest tables start first and dependencies get > resolved in the right order. > > I could revisit that patch for 9.4 if enough people are interested. Indeed... I've wasted hours copying databases for test environments, when that could've been hour (singular). -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal to add --single-row to psql
Fabrízio and Tom, I know that you can use --variable="FETCH_COUNT=1" from the psql command line, but internally that uses a CURSOR to batch the rows and [Redshift doesn't support CURSOR]( https://forums.aws.amazon.com/thread.jspa?threadID=122664&tstart=0) so it's not an option when using psql to download data from Redshift. Pavel's idea of having a single row mode option to replace FETCH_COUNT is interesting, does anyone have any problems with that or alternative ideas? Regards, Christopher On Wed, Apr 24, 2013 at 7:04 AM, Fabrízio de Royes Mello < fabriziome...@gmail.com> wrote: > > On Tue, Apr 23, 2013 at 1:05 PM, Tom Lane wrote: > > > > Isn't there already a way to set FETCH_COUNT from the command line? > > (ie, I think there's a generic variable-assignment facility that could > > do this) > > > > Christopher, > > Tom is all right... from psql [1] command line we can do that: > > $ bin/psql --variable=FETCH_COUNT=100 > psql (9.3devel) > Type "help" for help. > > fabrizio=# \echo :FETCH_COUNT > 100 > > Regards, > > [1] > http://www.postgresql.org/docs/current/interactive/app-psql.html#AEN84903 > > -- > Fabrízio de Royes Mello > Consultoria/Coaching PostgreSQL > >> Blog sobre TI: http://fabriziomello.blogspot.com > >> Perfil Linkedin: http://br.linkedin.com/in/fabriziomello > >> Twitter: http://twitter.com/fabriziomello >
[HACKERS] danger of stats_temp_directory = /dev/shm
With the stats file split patch 187492b6c2e8cafc5 introduced in 9.3dev, now after a crash the postmaster will try to delete all files in the directory stats_temp_directory. When that is just a subdirectory of PGDATA, this is fine. But it seems rather hostile when it is set to a shared directory, like the popular /dev/shm. Previously, it would only try to delete the one file /dev/shm/pgstat.stat, so the danger was much smaller. If /dev/shm is used, this only comes into play when postgres has crashed but the OS has not (If the OS has crashed, /dev/shm it will be empty anyway when it comes back) so perhaps this is not such a large exposure. The docs don't discuss the issue of what happens if stats_temp_directory is set to a non-private (to PGDATA) directory. The docs also do not explicitly recommend using /dev/shm, but there are plenty of examples of that usage that come up on google (and no examples of using a private subdirectory of /dev/shm rather than /dev/shm itself). Does this need to be fixed, or at least documented? Cheers, Jeff
Re: [HACKERS] Failing start-up archive recovery at Standby mode in PG9.2.4
I had a bit look on it and came up with an hypothesis.. umm or a scenario. == Just after restartpoint, too old xlog files are recycled but its page header has old content, specifically, xlogid and xrecoff. Plus, if the master's LSN is at the head of new segment file, the file for the segment may have not been created and the WAL send request for that LSN from the standby might fail as "requested WAL segment %s has already been removed", in spite of the segment is "not yet created"... So the standby received nack for the request and looks for archive but the file is properly not existent. But the file of that segment is certainly found in pg_xlog directory. So XLogFileRead grabs the old-in-content file and... > seems showing that the standby received negative ack for the request > for 20th segment, and 5 seconds later, it tried to get that from > archive and somehow thought that it'd gotten but the header is of 6th > segment. -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] danger of stats_temp_directory = /dev/shm
Jeff Janes escribió: > With the stats file split patch 187492b6c2e8cafc5 introduced in 9.3dev, now > after a crash the postmaster will try to delete all files in the directory > stats_temp_directory. When that is just a subdirectory of PGDATA, this is > fine. But it seems rather hostile when it is set to a shared directory, > like the popular /dev/shm. > Does this need to be fixed, or at least documented? I think we need it fixed so that it only deletes the files matching a well-known pattern. In fact, there's an XXX comment about this in the code: /* XXX should we try to ignore files other than the ones we write? */ -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Failing start-up archive recovery at Standby mode in PG9.2.4
Hi, I find problem about failing start-up achive recovery at Standby mode in PG9.2.4 streaming replication. I test same problem in PG9.2.3. But it is not occerd... cp: cannot stat `../arc/00030013': そのようなファイルやディレクトリはありません [Standby] 2013-04-22 01:27:25 EDTLOG: 0: restored log file "00020013" from archive I can't read the error message here, but this looks suspicious. The recovery command seems to be returning success although it prints such an error message? Sorry.. This suspicious massage is written in japanese. It means "missing file or directory". Beacause starting point of timeline 3 WAL file is 0003002A. cp: cannot stat `../arc/00030014': そのようなファイルやディレクトリはありません [* This point is illegal work]> [Standby] 2013-04-22 01:27:25 EDTLOG: 0: unexpected pageaddr 0/600 in log file 0, segment 20, offset 0 [Standby] 2013-04-22 01:27:25 EDTLOCATION: ValidXLOGHeader, xlog.c:4395 cp: cannot stat `../arc/00030014': そのようなファイルやディレクトリはありません cp: cannot stat `../arc/0004.history': そのようなファイルやディレクトリはありません [Master] 2013-04-22 01:27:25 EDTDEBUG: 0: received replication command: IDENTIFY_SYSTEM [Master] 2013-04-22 01:27:25 EDTLOCATION: HandleReplicationCommand, walsender.c:449 [Master] 2013-04-22 01:27:25 EDTDEBUG: 0: received replication command: START_REPLICATION 0/1400 [Master] 2013-04-22 01:27:25 EDTLOCATION: HandleReplicationCommand, walsender.c:449 [Master] 2013-04-22 01:27:25 EDTFATAL: 58P01: requested WAL segment 00030014 has already been removed [Master] 2013-04-22 01:27:25 EDTLOCATION: XLogRead, walsender.c:1023 [Standby] 2013-04-22 01:27:25 EDTLOG: 0: streaming replication successfully connected to primary [Standby] 2013-04-22 01:27:25 EDTLOCATION: libpqrcv_connect, libpqwalreceiver.c:171 [Standby] 2013-04-22 01:27:25 EDTFATAL: XX000: could not receive data from WAL stream: FATAL: requested WAL segment 00030014 has already been removed [Standby] 2013-04-22 01:27:25 EDTLOCATION: libpqrcv_receive, libpqwalreceiver.c:389 [mitsu-ko@localhost postgresql-9.2.4]$ bincp: cannot stat `../arc/00030014': そのようなファイルやディレクトリはありません [Standby] 2013-04-22 01:27:30 EDTDEBUG: 0: unexpected pageaddr 0/600 in log file 0, segment 20, offset 0 [Standby] 2013-04-22 01:27:30 EDTLOCATION: ValidXLOGHeader, xlog.c:4395 cp: cannot stat `../arc/00030014': そのようなファイルやディレクトリはありません cp: cannot stat `../arc/0004.history': そのようなファイルやディレクトリはありません [Master] 2013-04-22 01:27:30 EDTDEBUG: 0: received replication command: IDENTIFY_SYSTEM [Master] 2013-04-22 01:27:30 EDTLOCATION: HandleReplicationCommand, walsender.c:449 [Master] 2013-04-22 01:27:30 EDTDEBUG: 0: received replication command: START_REPLICATION 0/1400 [Master] 2013-04-22 01:27:30 EDTLOCATION: HandleReplicationCommand, walsender.c:449 [Master] 2013-04-22 01:27:30 EDTFATAL: 58P01: requested WAL segment 00030014 has already been removed [Master] 2013-04-22 01:27:30 EDTLOCATION: XLogRead, walsender.c:1023 [Standby] 2013-04-22 01:27:30 EDTLOG: 0: streaming replication successfully connected to primary [Standby] 2013-04-22 01:27:30 EDTLOCATION: libpqrcv_connect, libpqwalreceiver.c:171 [Standby] 2013-04-22 01:27:30 EDTFATAL: XX000: could not receive data from WAL stream: FATAL: requested WAL segment 00030014 has already been removed And this seems to be the second problem. You probably need to configure a higher wal_keep_segments on the primary or you need to fix your recovery_command. In this situation, I try to up timelineID by archive recovery, not also recover WAL files. If achive recovery works correctly in Standby, recovery done point is 00020029 in timelineID 2. Please see detail in pg9.2.4.log. Situation of archive file is here. We try to fix this problem with my company's colleaguea, but it is too difficult and complicated to WAL contorol and validation method... This problem is occerd in xlog.c:10692. 10692 if (!ValidXLOGHeader((XLogPageHeader) readBuf, emode, false)) 10693 goto next_record_is_invalid; And problem is occerd this checking in ValidXLOGHeader(). 4390 if (!XLByteEQ(hdr->xlp_pageaddr, recaddr)) 4391 { 4392 ereport(emode_for_corrupt_record(emode, recaddr), 4393 (errmsg("unexpected pageaddr %X/%X in log file %u, segment %u, offset %u", 4394 hdr->xlp_pageaddr.xlogid, hdr->xlp_pageaddr.xrecoff, 4395 readId, readSeg, readOff))); 4396 return false; 4397 } We think hdr->xlp_pageaddr.xrecoff has wrong value. This is very rare cace! We cannot analyze why this value is uncorrect! I think the tests is just noticing that we don't have valid WA
Re: [HACKERS] Enabling Checksums
On Wed, 2013-04-24 at 21:09 +0100, Simon Riggs wrote: > On 24 April 2013 21:06, Jeff Davis wrote: > > > What goal are you trying to accomplish with this patch? > > That we might need to patch the checksum version on a production release. Oh, I see. I don't think we need two output fields from pg_controldata though. It's a little redundant, and confused me when I was looking at the impact on pg_upgrade. And it means nothing to the user until we actually have multiple algorithms available, at which time we are better off with a text representation. Other than that, I think your patch is fine to accomplish the aforementioned goal. Essentially, it just changes the bool to a uint32, which I favor. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] danger of stats_temp_directory = /dev/shm
Alvaro Herrera writes: > Jeff Janes escribió: >> With the stats file split patch 187492b6c2e8cafc5 introduced in 9.3dev, now >> after a crash the postmaster will try to delete all files in the directory >> stats_temp_directory. When that is just a subdirectory of PGDATA, this is >> fine. But it seems rather hostile when it is set to a shared directory, >> like the popular /dev/shm. >> Does this need to be fixed, or at least documented? > I think we need it fixed so that it only deletes the files matching a > well-known pattern. I think we need it fixed to reject any stats_temp_directory that is not postgres-owned with restrictive permissions. The problem here is not with what it deletes, it's with the insanely insecure configuration. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Please add discussion topics for cluster-hackers meeting
Folks, The 2013 cluster-hackers meeting is less than a month away. If you are attending, and have topics of common interest to discuss, such as core APIs (i.e. event triggers are in, at least partway), please add your topic to the wiki here: https://wiki.postgresql.org/wiki/PgCon2013CanadaClusterSummit#10:30AM_to_11:30AM ... and make sure you're on the list of attendees, so that you get lunch! Also, note that a 3-hour PostgresXC summit will follow the general cluster-hackers meeting. Anyone interested in working on XC is encouraged to attend. --Josh Berkus -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Failing start-up archive recovery at Standby mode in PG9.2.4
I also had a similar observation when I could reproduce this. I tried to find why restartpoint causes the recycled segment to be named after timeline 3, but have not been able to determine that. When I looked at the source, I found that, the function XLogFileReadAnyTLI which returns a segment file for reading a XLog page iterates over a list expectedTLIs which starts with 3 in such a case (that is, in case where this error happens). XLogFileReadAnyTLI checks the segment in both archive and pg_xlog. So, even if such a segment (that is with timeline 3) may not be in the archive , it may be in pg_xlog, due to recycling as we have observed. So, such a recycled segment may be returned by XLogFileReadAnyTLI as though it were the next segment to recover from, resulting in the "unexpected pageaddr ..." error. I could not understand (in case this error happens) how expectedTLIs list had 3 at its head (for example, when XLogFileReadAnyTLI used it as we observed) whereas at other times (when this error does not happen), it has 2 at its head until all the segments of timeline 2 are recovered from? Also, how does recycled segment gets timeline 3 whereas 2 is expected in this case? Is this the right way to look at the problem and its possible fix? I have tried going through the source regarding this but have not been able to determine where this could accidentally happen, partly because I do not understand recovery process (and its code) very well. Will post if find anything useful. regards, Amit Langote -- View this message in context: http://postgresql.1045698.n5.nabble.com/Failing-start-up-archive-recovery-at-Standby-mode-in-PG9-2-4-tp5753110p5753221.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Redundancy in comment within lock.c
Silly typo report. Line 3774 of lmgr's lock.c says: * Re-acquire a lock belonging to a transaction that was prepared, when * when starting up into hot standby mode. This has been the case since the original hot standby commit. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers