Re: [HACKERS] Inconsistent DB data in Streaming Replication
>What Samrat is proposing here is that WAL is not flushed to the OS before >it is acked by a synchronous replica so recovery won't go past the >timeline change made in failover, making it necessary to take a new >base backup to resync with the new master. Actually we are proposing that the data page on the master is not committed till master receives ACK from the standby. The WAL files can be flushed to the disk on both the master and standby, before standby generates ACK to master. The end objective is the same of avoiding to take base backup of old master to resync with new master. >Why do you think that the inconsistent data after failover happens is >problem? Because >it's one of the reasons why a fresh base backup is required when >starting old master as >new standby? If yes, I agree with you. I've often heard the complaints >about a backup >when restarting new standby. That's really big problem. Yes, taking backup is major problem when the database size is more than several TB. It would take very long time to ship backup data over the slow WAN network. >> One solution to avoid this situation is have the master send WAL records to standby and wait for ACK from standby committing WAL files to disk and only after that commit data page related to this transaction on master. >You mean to make the master wait the data page write until WAL has been not only >flushed to disk but also replicated to the standby? Yes. Master should not write the data page before corresponding WAL records have been replicated to the standby. The WAL records have been flushed to disk on both master and standby. >> The main drawback would be increased wait time for the client due to extra round trip to standby before master sends ACK to client. Are there any other issues with this approach? >I think that you can introduce GUC specifying whether this extra check >is required to avoid a backup when failback That would be better idea. We can disable it whenever taking a fresh backup is not a problem. Regards, Samrat On Mon, Apr 8, 2013 at 10:40 PM, Fujii Masao wrote: > On Mon, Apr 8, 2013 at 7:34 PM, Samrat Revagade > wrote: > > > > Hello, > > > > We have been trying to figure out possible solutions to the following > problem in streaming replication Consider following scenario: > > > > If master receives commit command, it writes and flushes commit WAL > records to the disk, It also writes and flushes data page related to this > transaction. > > > > The master then sends WAL records to standby up to the commit WAL > record. But before sending these records if failover happens then, old > master is ahead of standby which is now the new master in terms of DB data > leading to inconsistent data . > > Why do you think that the inconsistent data after failover happens is > problem? Because > it's one of the reasons why a fresh base backup is required when > starting old master as > new standby? If yes, I agree with you. I've often heard the complaints > about a backup > when restarting new standby. That's really big problem. > > The timeline mismatch after failover was one of the reasons why a > backup is required. > But, thanks to Heikki's recent work, that's solved, i.e., the timeline > mismatch would be > automatically resolved when starting replication in 9.3. So, the > remaining problem is an > inconsistent database. > > > One solution to avoid this situation is have the master send WAL records > to standby and wait for ACK from standby committing WAL files to disk and > only after that commit data page related to this transaction on master. > > You mean to make the master wait the data page write until WAL has been > not only > flushed to disk but also replicated to the standby? > > > The main drawback would be increased wait time for the client due to > extra round trip to standby before master sends ACK to client. Are there > any other issues with this approach? > > I think that you can introduce GUC specifying whether this extra check > is required to > avoid a backup when failback. > > Regards, > > -- > Fujii Masao >
Re: [HACKERS] Unrecognized type error (postgres 9.1.4)
On Monday, April 08, 2013 7:28 PM Rodrigo Barboza wrote: On Sat, Apr 6, 2013 at 10:39 PM, Rodrigo Barboza wrote: Ok! I will try to reproduce in a smaller scenario. On Sat, Apr 6, 2013 at 9:53 PM, Amit Kapila wrote: On Saturday, April 06, 2013 12:18 PM Rodrigo Barboza wrote: >On Sat, Apr 6, 2013 at 12:23 AM, Amit Kapila wrote: On Saturday, April 06, 2013 3:57 AM Rodrigo Barboza wrote: Hello. I created a type my_uint that is a unsigned int 32. I am trying to update data of a table that contains a column of this type. Here is what happens: postgresql=> explain analyze UPDATE attribute_type_conf SET rowform = rowform +1 where rowform <= 18; ERROR: unsupported type: 132852 Can you post your complete test (like your type creation and its use for table and any initial data you loaded to it)? >>> Well, it's a lot of data. >>> May I send it atached? >>If you can't make it to small reproducible test, then you can send. >I was trying to reproduce the error, but it is was not raising error. >I didn't change anything. >Last week I dropped the database, created it again, populated my db and when it was time to run the query, the error raised. >I'm puzzled. I can't trust it... > But now I run this script and the error finally raised. It seems random. The reason for seldom behavior is that, it occurs only when the value you are giving in your where clause lies in valid boundary of histogram (refer function ineq_histogram_selectivity). > psql -U testuser testdb -c "drop table if exists tm32;" > psql -U testuser testdb -c "create table tm32 (a tmuint32);" > for ((i=0; i<100; i++));do > psql -U testuser testdb < insert into tm32 values($i); > ENDOFSQLDATA > done > for ((i=0; i<100; i++ )); do > psql -U testuser testdb < BEGIN; > UPDATE tm32 SET a = a + 1 WHERE a > $i; > END; > ENDOFSQLDATA > done > The error message: > ERROR: unsupported type: 202886 > ROLLBACK You have identified rightly in your other mail that it happens in function convert_numeric_to_scalar(). But I think adding user defined datatype handling in this function might not be straight forward. You can refer below text from link http://www.postgresql.org/docs/9.2/static/xoper-optimization.html "You can use scalarltsel and scalargtsel for comparisons on data types that have some sensible means of being converted into numeric scalars for range comparisons. If possible, add the data type to those understood by the function convert_to_scalar() in src/backend/utils/adt/selfuncs.c. (Eventually, this function should be replaced by per-data-type functions identified through a column of the pg_type system catalog; but that hasn't happened yet.) If you do not do this, things will still work, but the optimizer's estimates won't be as good as they could be." I could think of following workaround's for your problem. 1. For your table, set values for autovacuum_analyze_threshold and autovacuum_analyze_scale_factor very high (refer Create Table), so that it doesn't analyze your table and return default selectivity, which should work fine if your sql statements are simple. 2. Write your own selectivity functions and return default Selectivity from them and use them while creating operators. 3. Use bind value in where clause, it will return default selectivity for it. 4. Some other way, with which it does not collect histogram stats (means it will use minimal stats compute_minimal_stats). I am not sure but you can try once without defining operators. All the above way's can help to resolve your current problem, but they are not good way if you have some usage of sql statements with these datatypes. With Regards, Amit Kapila. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: index support for regexp search
Alexander Korotkov writes: > On Mon, Apr 8, 2013 at 9:28 AM, Tom Lane wrote: >> I spent the weekend hacking on this, making a number of bug fixes and a >> whole lot of cosmetic changes. I think there are large parts of this >> that are in committable shape now, but I still find the actual graph >> transformation logic to be mostly unintelligible. I think what's most >> obscure is the distinction between the arcs list and the keys list of >> each state in the expanded graph. I get the impression that the >> general idea is for the arcs to represent exactly-known transitions >> while the keys represent imprecisely-known transitions ... but there >> seems to be at least some leakage between those categories. Could >> you write down a specification for what's supposed to be happening >> there? > Here is my try to specify it. Thanks. I hacked on this some more and committed it. I found a number of bugs along the way with respect to handling of word boundaries (partially-blank transition trigrams) and EOL-color ($) handling. I think it's all fixed now but it could definitely use some more study and testing. One issue that bothered me is that the regression tests really don't provide much visibility into what the code is doing. Some of the bugs had to do with failing to generate expected trigrams, for instance col ~ 'foo bar' only generating trigram "foo" and not "bar". This still led to getting the right answer, so the error was invisible as far as the tests were concerned. Is it worth thinking of a way to expose what the extract function did at SQL level, so we could test more carefully? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Page replacement algorithm in buffer cache
On Fri, Apr 5, 2013 at 11:08 PM, Amit Kapila wrote: > I still have one more doubt, consider the below scenario for cases when we > Invalidate buffers during moving to freelist v/s just move to freelist > >Backend got the buffer from freelist for a request of page-9 (number 9 is > random, just to explain), it still have association with another page-10 >It needs to add the buffer with new tag (new page association) in bufhash > table and remove the buffer with oldTag (old page association). > > The benefit for just moving to freelist is that if we get request of same > page until somebody else used it for another page, it will save read I/O. > However on the other side for many cases > Backend will need extra partition lock to remove oldTag (which can lead to > some bottleneck). > > I think saving read I/O is more beneficial but just not sure if that is best > as cases might be less for it. I think saving read I/O is a lot more beneficial. I haven't seen evidence of a severe bottleneck updating the buffer mapping tables. I have seen some evidence of spinlock-level contention on read workloads that fit in shared buffers, because in that case the system can run fast enough for the spinlocks protecting the lwlocks to get pretty hot. But if you're doing writes, or if the workload doesn't fit in shared buffers, other bottlenecks slow you down enough that this doesn't really seem to become much of an issue. Also, even if you *can* find some scenario where pushing the buffer invalidation into the background is a win, I'm not convinced that would justify doing it, because the case where it's a huge loss - namely, working set just a tiny bit smaller than shared_buffers - is pretty obvious. I don't think we dare fool around with that; the townspeople will arrive with pitchforks. I believe that the big win here is getting the clock sweep out of the foreground so that BufFreelistLock doesn't catch fire. The buffer mapping locks are partitioned and, while it's not like that completely gets rid of the contention, it sure does help a lot. So I would view that goal as primary, at least for now. If we get a first round of optimization done in this area, that doesn't preclude further improving it in the future. > Last time following tests have been executed to validate the results: > > Test suite - pgbench > DB Size - 16 GB > RAM - 24 GB > Shared Buffers - 2G, 5G, 7G, 10G > Concurrency - 8, 16, 32, 64 clients > Pre-warm the buffers before start of test > > Shall we try for any other scenario's or for initial test of patch above are > okay. Seems like a reasonable place to start. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
On Fri, Apr 5, 2013 at 9:39 PM, Ants Aasma wrote: > Unless somebody tells me not to waste my time I'll go ahead and come > up with a workable patch by Monday. And here you go. I decided to be verbose with the comments as it's easier to delete a comment to write one. I also left in a huge jumble of macros to calculate the contents of a helper var during compile time. This can easily be replaced with the calculated values once we settle on specific parameters. Currently only x86-64 is implemented. 32bit x86 would be mostly a copy-and-paste job, replacing 64bit pointer registers with 32bit ones. For other platforms the simplest way would be to use a vectorizing compiler on the generic variant. -funroll-loops -ftree-vectorize is enough on gcc. Quick bench results on the worst case workload: master no checksums: tps = 15.561848 master with checksums: tps = 1.695450 simd checksums: tps = 14.602698 Regards, Ants Aasma -- Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de simd-checksums.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [PATCH] Exorcise "zero-dimensional" arrays (Was: Re: Should array_length() Return NULL)
On 9 April 2013 09:24, Josh Berkus wrote: > As much as I have a keen interest in this feature, it isn't (AFAIK) > being considered for 9.3. Given that it's generated a fair amount of > controversy, could we table it until 9.3 beta? There's still plenty of > unresolved 9.3 patches in the queue. No problem. I certainly wasn't expecting it to run for 90 messages when I started out. I'll pipe down for now and resume after the beta. Cheers, BJ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Reassign variable value in XLogReadRecord
Hello, While walking in the code I see the following code in src/backend/access/transam/xlogreader.c:177-191 XLogRecord * XLogReadRecord(XLogReaderState *state, XLogRecPtr RecPtr, char **errormsg) { XLogRecord *record; XLogRecPtr targetPagePtr; boolrandAccess = false; <=== assign uint32 len, total_len; uint32 targetRecOff; uint32 pageHeaderSize; boolgotheader; int readOff; randAccess = false; <== reassign /* reset error state */ Do I am missing something or the last one is unnecessary? Best regards. -- Dickson S. Guedes mail/xmpp: gue...@guedesoft.net - skype: guediz http://github.com/guedes - http://guedesoft.net http://www.postgresql.org.br -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [PATCH] Exorcise "zero-dimensional" arrays (Was: Re: Should array_length() Return NULL)
On Mon, Apr 8, 2013 at 04:24:31PM -0700, Josh Berkus wrote: > All, > > As much as I have a keen interest in this feature, it isn't (AFAIK) > being considered for 9.3. Given that it's generated a fair amount of > controversy, could we table it until 9.3 beta? There's still plenty of > unresolved 9.3 patches in the queue. Can someone add a TODO item or a link to the thread? -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [PATCH] Exorcise "zero-dimensional" arrays (Was: Re: Should array_length() Return NULL)
All, As much as I have a keen interest in this feature, it isn't (AFAIK) being considered for 9.3. Given that it's generated a fair amount of controversy, could we table it until 9.3 beta? There's still plenty of unresolved 9.3 patches in the queue. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fwd: Range types (DATERANGE, TSTZRANGE) in a foreign key with "inclusion" logic
Matthias Nagel wrote: > I would like to do someting like: > > FOREIGN KEY ( container_id, lifetime ) > REFERENCES other_table (id, lifetime ) > USING gist ( container_id WITH =, lifetime WITH <@ ) > > (Of course, this is PosgreSQL-pseudo-code, but it hopefully make > clear what I want.) > > So, now my questions: > > 1) Does this kind of feature already exist in 9.2? No. > 2) If this feature does not directly exist, has anybody a good > idea how to mimic the intended behaviour? I would probably do it by using only SERIALIZABLE transactions to modify data, and enforce the constraint in triggers. That would not be as convenient or as performant as the feature you're requesting would probably be, but it would work as long as you have a systematic way to retry transactions which are rolled back with a serialization failure. > 3) If neither 1) or 2) applies, are there any plans to integrate > such a feature? I have heard Jeff Davis talk about the possibility of such a feature, that he figures would be called "inclusion constraints". I have not heard of any actual development on the idea yet. > Having range types and exclusion contraints are nice, as I said > in the introdruction. But if the reverse (foreign key with > inclusion) would also work, the range type feature would really > be amazing. Agreed. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unrecognized type error (postgres 9.1.4)
On Mon, Apr 8, 2013 at 4:30 PM, Rodrigo Barboza wrote: > > > > > On Mon, Apr 8, 2013 at 12:14 PM, Rodrigo Barboza > wrote: > >> >> >> >> On Mon, Apr 8, 2013 at 11:44 AM, Rodrigo Barboza > > wrote: >> >>> >>> >>> >>> On Mon, Apr 8, 2013 at 11:27 AM, Rodrigo Barboza < >>> rodrigombu...@gmail.com> wrote: >>> On Mon, Apr 8, 2013 at 11:25 AM, Tom Lane wrote: > Rodrigo Barboza writes: > > UPDATE tm32 SET a = a + 1 WHERE a > $i; > > ERROR: unsupported type: 202886 > > I'm betting that's coming from scalargtsel, which doesn't know anything > about your type, but you've nominated it to be the selectivity function > for ">" anyway. > > /* > * Can't get here unless someone tries to use > scalarltsel/scalargtsel on > * an operator with one numeric and one non-numeric operand. > */ > elog(ERROR, "unsupported type: %u", typid); > > regards, tom lane > Yes, I found it in the code, but I followed the example from the postgres documentation that uses this function. And why does it work sometimes? Why not other times? >>> >>> Here is a very simple case and weird behavior. I select * from a table >>> and returns 4 entries. >>> But when I run with a filter the error raises and crazy values are >>> printed from the params. >>> >>> Here is my funcitons where I compare the values: >>> >>> typedef uint32_t TmUInt32; >>> >>> static int >>> tmuint32_int32_abs_cmp_internal(TmUInt32 a, int32_t b) >>> { >>> int ret; >>> elog(NOTICE, "funcao:%s linha:%d\n", *_FUNCTION_*, *_LINE_*); >>> if (a < b) ret = -1; >>> else if (a > b) ret = 1; >>> else ret = 0; >>> elog(NOTICE, "funcao:%s linha:%d, ret: %d a: %u\n", *_FUNCTION_*, * >>> _LINE_*, ret, a); >>> return ret; >>> } >>> >>> PG_FUNCTION_INFO_V1(tmuint32_int32_abs_gt); >>> >>> Datum >>> tmuint32_int32_abs_gt(PG_FUNCTION_ARGS) >>> { >>> TmUInt32 *param1; >>> int32_t param2; >>> elog(NOTICE, "funcao:%s linha:%d\n", *_FUNCTION_*, *_LINE_*); >>> >>> if(PG_ARGISNULL(0) || PG_ARGISNULL(1)) PG_RETURN_NULL(); >>> >>> param1 = (TmUInt32 *) PG_GETARG_POINTER(0); >>> param2 = DatumGetInt32(PG_GETARG_DATUM(1)); >>> >>> elog(NOTICE, "funcao:%s linha:%d param1: %u, param2: %d\n", *_FUNCTION_*, >>> *_LINE_*, *param1, param2); >>> PG_RETURN_BOOL(tmuint32_int32_abs_cmp_internal(*param1, param2) > 0); >>> } >>> >>> >>> And here is the simple test. >>> >>> -- SIMPLE QUERY >>> select * from a; >>> >>> NOTICE: funcao:tmuint32_out linha:191 >>> >>> NOTICE: funcao:tmuint32_out linha:191 >>> >>> NOTICE: funcao:tmuint32_out linha:191 >>> >>> NOTICE: funcao:tmuint32_out linha:191 >>> >>> a >>> --- >>> 0 >>> 1 >>> 2 >>> 3 >>> (4 rows) >>> >>> >>> _ >>> >>> -- QUERY WHITH FILTER >>> select * from a where a > 1; >>> >>> NOTICE: funcao:tmuint32_int32_abs_gt linha:1296 >>> >>> NOTICE: funcao:tmuint32_int32_abs_gt linha:1303 param1: 0, param2: 1 >>> >>> NOTICE: funcao:tmuint32_int32_abs_cmp_internal linha:742 >>> >>> NOTICE: funcao:tmuint32_int32_abs_cmp_internal linha:746, ret: -1 a: 0 >>> >>> NOTICE: funcao:tmuint32_int32_abs_gt linha:1296 >>> >>> NOTICE: funcao:tmuint32_int32_abs_gt linha:1303 param1: 99, param2: 1 >>> >>> NOTICE: funcao:tmuint32_int32_abs_cmp_internal linha:742 >>> >>> NOTICE: funcao:tmuint32_int32_abs_cmp_internal linha:746, ret: 1 a: 99 >>> >>> NOTICE: funcao:tmuint32_int32_abs_gt linha:1296 >>> >>> NOTICE: funcao:tmuint32_int32_abs_gt linha:1303 param1: 50, param2: 1 >>> >>> NOTICE: funcao:tmuint32_int32_abs_cmp_internal linha:742 >>> >>> NOTICE: funcao:tmuint32_int32_abs_cmp_internal linha:746, ret: 1 a: 50 >>> >>> NOTICE: funcao:tmuint32_int32_abs_gt linha:1296 >>> >>> NOTICE: funcao:tmuint32_int32_abs_gt linha:1303 param1: 24, param2: 1 >>> >>> NOTICE: funcao:tmuint32_int32_abs_cmp_internal linha:742 >>> >>> NOTICE: funcao:tmuint32_int32_abs_cmp_internal linha:746, ret: 1 a: 24 >>> >>> NOTICE: funcao:tmuint32_int32_abs_gt linha:1296 >>> >>> NOTICE: funcao:tmuint32_int32_abs_gt linha:1303 param1: 12, param2: 1 >>> >>> NOTICE: funcao:tmuint32_int32_abs_cmp_internal linha:742 >>> >>> NOTICE: funcao:tmuint32_int32_abs_cmp_internal linha:746, ret: 1 a: 12 >>> >>> NOTICE: funcao:tmuint32_int32_abs_gt linha:1296 >>> >>> NOTICE: funcao:tmuint32_int32_abs_gt linha:1303 param1: 6, param2: 1 >>> >>> NOTICE: funcao:tmuint32_int32_abs_cmp_internal linha:742 >>> >>> NOTICE: funcao:tmuint32_int32_abs_cmp_internal linha:746, ret: 1 a: 6 >>> >>> NOTICE: funcao:tmuint32_int32_abs_gt linha:1296 >>> >>> NOTICE: funcao:tmuint32_int32_abs_gt linha:1303 param1: 2, param2: 1 >>> >>> NOTICE: funcao:tmuint32_int32_abs_cmp_internal linha:742 >>> >>> NOTICE: funcao:tmuint32_int32_abs_cmp_internal linha:746, ret: 1 a: 2 >>> >>> NOTICE: funcao:tmuint32_int32_abs_gt linha:1296 >>> >>> NOTICE: funcao:tmuint32_int32_abs_gt linha:130
Re: [HACKERS] Back branches vs. gcc 4.8.0
On 09/04/13 08:41, Peter Eisentraut wrote: On 4/5/13 6:14 PM, Tom Lane wrote: Since gcc 4.8 is going to be on a lot of people's machines pretty soon, I think we need to do something to prevent it from breaking 8.4.x and 9.0.x. It looks like our choices are (1) teach configure to enable -fno-aggressive-loop-optimizations if the compiler recognizes it, or (2) back-port commit 8137f2c32322c624e0431fac1621e8e9315202f9. I'm a bit leaning towards (1), mainly because I'm not excited about fighting a compiler arms race in the back branches. At the moment, I wouldn't do anything. At least until we have converted master to use flexible array members completely, and we have learned the extent of the issue. The problem manifests itself easily through the regression tests, so there is no guessing about whether a particular combination of versions will work. Someone who uses a cutting edge compiler with a somewhat old PG release is doing something special anyway, so they should have the required skills to put in the workaround. I would rather avoid patching in specific compiler options for specific versions. These things come and go, but releases live a long time. How do we know -fno-aggressive-loop-optimizations is the only option we need in the long run? I'd rather see a direct crash or a known code fix. As an aside, we already have -fno-strict-aliasing and -fwrapv. Add more and it will begin to read like -fmy-code -fis-broken -fhelp-me ;-) -fno-break-my-code
Re: [HACKERS] Back branches vs. gcc 4.8.0
On 4/5/13 6:14 PM, Tom Lane wrote: > Since gcc 4.8 is going to be on a lot of people's machines pretty soon, > I think we need to do something to prevent it from breaking 8.4.x and > 9.0.x. It looks like our choices are (1) teach configure to enable > -fno-aggressive-loop-optimizations if the compiler recognizes it, > or (2) back-port commit 8137f2c32322c624e0431fac1621e8e9315202f9. > > I'm a bit leaning towards (1), mainly because I'm not excited about > fighting a compiler arms race in the back branches. At the moment, I wouldn't do anything. At least until we have converted master to use flexible array members completely, and we have learned the extent of the issue. The problem manifests itself easily through the regression tests, so there is no guessing about whether a particular combination of versions will work. Someone who uses a cutting edge compiler with a somewhat old PG release is doing something special anyway, so they should have the required skills to put in the workaround. I would rather avoid patching in specific compiler options for specific versions. These things come and go, but releases live a long time. How do we know -fno-aggressive-loop-optimizations is the only option we need in the long run? I'd rather see a direct crash or a known code fix. As an aside, we already have -fno-strict-aliasing and -fwrapv. Add more and it will begin to read like -fmy-code -fis-broken -fhelp-me ;-) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unrecognized type error (postgres 9.1.4)
On Mon, Apr 8, 2013 at 12:14 PM, Rodrigo Barboza wrote: > > > > On Mon, Apr 8, 2013 at 11:44 AM, Rodrigo Barboza > wrote: > >> >> >> >> On Mon, Apr 8, 2013 at 11:27 AM, Rodrigo Barboza > > wrote: >> >>> >>> >>> On Mon, Apr 8, 2013 at 11:25 AM, Tom Lane wrote: >>> Rodrigo Barboza writes: > UPDATE tm32 SET a = a + 1 WHERE a > $i; > ERROR: unsupported type: 202886 I'm betting that's coming from scalargtsel, which doesn't know anything about your type, but you've nominated it to be the selectivity function for ">" anyway. /* * Can't get here unless someone tries to use scalarltsel/scalargtsel on * an operator with one numeric and one non-numeric operand. */ elog(ERROR, "unsupported type: %u", typid); regards, tom lane >>> >>> >>> Yes, I found it in the code, but I followed the example from the >>> postgres documentation that uses this function. >>> And why does it work sometimes? Why not other times? >>> >>> >> >> Here is a very simple case and weird behavior. I select * from a table >> and returns 4 entries. >> But when I run with a filter the error raises and crazy values are >> printed from the params. >> >> Here is my funcitons where I compare the values: >> >> typedef uint32_t TmUInt32; >> >> static int >> tmuint32_int32_abs_cmp_internal(TmUInt32 a, int32_t b) >> { >> int ret; >> elog(NOTICE, "funcao:%s linha:%d\n", *_FUNCTION_*, *_LINE_*); >> if (a < b) ret = -1; >> else if (a > b) ret = 1; >> else ret = 0; >> elog(NOTICE, "funcao:%s linha:%d, ret: %d a: %u\n", *_FUNCTION_*, *_LINE_ >> *, ret, a); >> return ret; >> } >> >> PG_FUNCTION_INFO_V1(tmuint32_int32_abs_gt); >> >> Datum >> tmuint32_int32_abs_gt(PG_FUNCTION_ARGS) >> { >> TmUInt32 *param1; >> int32_t param2; >> elog(NOTICE, "funcao:%s linha:%d\n", *_FUNCTION_*, *_LINE_*); >> >> if(PG_ARGISNULL(0) || PG_ARGISNULL(1)) PG_RETURN_NULL(); >> >> param1 = (TmUInt32 *) PG_GETARG_POINTER(0); >> param2 = DatumGetInt32(PG_GETARG_DATUM(1)); >> >> elog(NOTICE, "funcao:%s linha:%d param1: %u, param2: %d\n", *_FUNCTION_*, >> *_LINE_*, *param1, param2); >> PG_RETURN_BOOL(tmuint32_int32_abs_cmp_internal(*param1, param2) > 0); >> } >> >> >> And here is the simple test. >> >> -- SIMPLE QUERY >> select * from a; >> >> NOTICE: funcao:tmuint32_out linha:191 >> >> NOTICE: funcao:tmuint32_out linha:191 >> >> NOTICE: funcao:tmuint32_out linha:191 >> >> NOTICE: funcao:tmuint32_out linha:191 >> >> a >> --- >> 0 >> 1 >> 2 >> 3 >> (4 rows) >> >> >> _ >> >> -- QUERY WHITH FILTER >> select * from a where a > 1; >> >> NOTICE: funcao:tmuint32_int32_abs_gt linha:1296 >> >> NOTICE: funcao:tmuint32_int32_abs_gt linha:1303 param1: 0, param2: 1 >> >> NOTICE: funcao:tmuint32_int32_abs_cmp_internal linha:742 >> >> NOTICE: funcao:tmuint32_int32_abs_cmp_internal linha:746, ret: -1 a: 0 >> >> NOTICE: funcao:tmuint32_int32_abs_gt linha:1296 >> >> NOTICE: funcao:tmuint32_int32_abs_gt linha:1303 param1: 99, param2: 1 >> >> NOTICE: funcao:tmuint32_int32_abs_cmp_internal linha:742 >> >> NOTICE: funcao:tmuint32_int32_abs_cmp_internal linha:746, ret: 1 a: 99 >> >> NOTICE: funcao:tmuint32_int32_abs_gt linha:1296 >> >> NOTICE: funcao:tmuint32_int32_abs_gt linha:1303 param1: 50, param2: 1 >> >> NOTICE: funcao:tmuint32_int32_abs_cmp_internal linha:742 >> >> NOTICE: funcao:tmuint32_int32_abs_cmp_internal linha:746, ret: 1 a: 50 >> >> NOTICE: funcao:tmuint32_int32_abs_gt linha:1296 >> >> NOTICE: funcao:tmuint32_int32_abs_gt linha:1303 param1: 24, param2: 1 >> >> NOTICE: funcao:tmuint32_int32_abs_cmp_internal linha:742 >> >> NOTICE: funcao:tmuint32_int32_abs_cmp_internal linha:746, ret: 1 a: 24 >> >> NOTICE: funcao:tmuint32_int32_abs_gt linha:1296 >> >> NOTICE: funcao:tmuint32_int32_abs_gt linha:1303 param1: 12, param2: 1 >> >> NOTICE: funcao:tmuint32_int32_abs_cmp_internal linha:742 >> >> NOTICE: funcao:tmuint32_int32_abs_cmp_internal linha:746, ret: 1 a: 12 >> >> NOTICE: funcao:tmuint32_int32_abs_gt linha:1296 >> >> NOTICE: funcao:tmuint32_int32_abs_gt linha:1303 param1: 6, param2: 1 >> >> NOTICE: funcao:tmuint32_int32_abs_cmp_internal linha:742 >> >> NOTICE: funcao:tmuint32_int32_abs_cmp_internal linha:746, ret: 1 a: 6 >> >> NOTICE: funcao:tmuint32_int32_abs_gt linha:1296 >> >> NOTICE: funcao:tmuint32_int32_abs_gt linha:1303 param1: 2, param2: 1 >> >> NOTICE: funcao:tmuint32_int32_abs_cmp_internal linha:742 >> >> NOTICE: funcao:tmuint32_int32_abs_cmp_internal linha:746, ret: 1 a: 2 >> >> NOTICE: funcao:tmuint32_int32_abs_gt linha:1296 >> >> NOTICE: funcao:tmuint32_int32_abs_gt linha:1303 param1: 1, param2: 1 >> >> NOTICE: funcao:tmuint32_int32_abs_cmp_internal linha:742 >> >> NOTICE: funcao:tmuint32_int32_abs_cmp_internal linha:746, ret: 0 a: 1 >> >> ERROR: unsupported type: 220200 >> >> > > I found that the problem is in the highlithed line.
Re: [HACKERS] Inconsistent DB data in Streaming Replication
On Mon, Apr 8, 2013 at 7:34 PM, Samrat Revagade wrote: > > Hello, > > We have been trying to figure out possible solutions to the following problem > in streaming replication Consider following scenario: > > If master receives commit command, it writes and flushes commit WAL records > to the disk, It also writes and flushes data page related to this transaction. > > The master then sends WAL records to standby up to the commit WAL record. But > before sending these records if failover happens then, old master is ahead > of standby which is now the new master in terms of DB data leading to > inconsistent data . Why do you think that the inconsistent data after failover happens is problem? Because it's one of the reasons why a fresh base backup is required when starting old master as new standby? If yes, I agree with you. I've often heard the complaints about a backup when restarting new standby. That's really big problem. The timeline mismatch after failover was one of the reasons why a backup is required. But, thanks to Heikki's recent work, that's solved, i.e., the timeline mismatch would be automatically resolved when starting replication in 9.3. So, the remaining problem is an inconsistent database. > One solution to avoid this situation is have the master send WAL records to > standby and wait for ACK from standby committing WAL files to disk and only > after that commit data page related to this transaction on master. You mean to make the master wait the data page write until WAL has been not only flushed to disk but also replicated to the standby? > The main drawback would be increased wait time for the client due to extra > round trip to standby before master sends ACK to client. Are there any other > issues with this approach? I think that you can introduce GUC specifying whether this extra check is required to avoid a backup when failback. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Inconsistent DB data in Streaming Replication
On Mon, Apr 8, 2013 at 7:38 PM, Andres Freund wrote: > On 2013-04-08 19:26:33 +0300, Ants Aasma wrote: >> Not exactly. Sync-rep ensures that commit success is not sent to the >> client before a synchronous replica acks the commit record. What >> Samrat is proposing here is that WAL is not flushed to the OS before >> it is acked by a synchronous replica so recovery won't go past the >> timeline change made in failover, making it necessary to take a new >> base backup to resync with the new master. I seem to remember this >> being discussed when sync rep was committed. I don't recall if the >> idea was discarded only on performance grounds or whether there were >> other issues too. > > Thats not going to work for a fair number of reasons: > * wal is streamed *from disk* not from memory Yeah, this one alone makes the do-not-flush-before-replicating approach impractical. > * what if the local node crashes/restarts immediately? Then the standby > is farther ahead than the master. > * the performance implications of never writing data before flushing it > are pretty severe > * ... > > So this doesn't seem to solve anything. Yeah, delaying WAL writes until replication is successful seems impractical, but I don't see why we couldn't optionally take into account walsender write pointers when considering if we can write out a page. Sure there will be some performance hit for waiting to replicate WAL, but on the other hand having to rsync a huge database isn't too good for performance either. Regards, Ants Aasma -- Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Inconsistent DB data in Streaming Replication
On 2013-04-08 19:26:33 +0300, Ants Aasma wrote: > On Mon, Apr 8, 2013 at 6:50 PM, Shaun Thomas wrote: > > On 04/08/2013 05:34 AM, Samrat Revagade wrote: > > > >> One solution to avoid this situation is have the master send WAL > >> records to standby and wait for ACK from standby committing WAL files > >> to disk and only after that commit data page related to this > >> transaction on master. > > > > > > Isn't this basically what synchronous replication does in PG 9.1+? > > Not exactly. Sync-rep ensures that commit success is not sent to the > client before a synchronous replica acks the commit record. What > Samrat is proposing here is that WAL is not flushed to the OS before > it is acked by a synchronous replica so recovery won't go past the > timeline change made in failover, making it necessary to take a new > base backup to resync with the new master. I seem to remember this > being discussed when sync rep was committed. I don't recall if the > idea was discarded only on performance grounds or whether there were > other issues too. Thats not going to work for a fair number of reasons: * wal is streamed *from disk* not from memory * what if the local node crashes/restarts immediately? Then the standby is farther ahead than the master. * the performance implications of never writing data before flushing it are pretty severe * ... So this doesn't seem to solve anything. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sepgsql 2/3] Add db_schema:search permission checks
2013/4/5 Robert Haas : > On Thu, Apr 4, 2013 at 8:26 AM, Kohei KaiGai wrote: >> OK, I follow the manner of the terminology as we usually call it. >> The attached patch just replaced things you suggested. > > Thanks, I have committed this, after making some changes to the > comments and documentation. Please review the changes and let me know > if you see any mistakes. > Thanks. I could find two obvious wording stuffs here, please see smaller one of the attached patches. I didn't fixup manner to use "XXX" in source code comments. Also, the attached function-execute-permission patch is a rebased version. I rethought its event name should be OAT_FUNCTION_EXECUTE, rather than OAT_FUNCTION_EXEC according to the manner without abbreviation. Other portion is same as previous ones. > BTW, if it were possible to set things up so that the test_sepgsql > script could validate the version of the sepgsql-regtest policy > installed, that would eliminate a certain category of errors. I > notice also that the regression tests themselves seem to fail if you > invoke the script as contrib/sepgsql/test_sepgsql rather than > ./test_sepgsql, which suggests another possible pre-validation step. > Please see the test-script-fixup patch. I added "cd `dirname $0`" on top of the script. It makes pg_regress to avoid this troubles. Probably, pg_regress was unavailable to read sql commands to run. A problem regarding to validation of sepgsql-regtest policy module is originated by semodule commands that takes root privilege to list up installed policy modules. So, I avoided to use this command in the test_sepgsql script. However, I have an idea that does not raise script fail even if "sudo semodule -l" returned an error, except for a case when it can run correctly and the policy version is not expected one. How about your opinion for this check? Thanks, -- KaiGai Kohei pgsql-v9.3-obvious-wording-fixes.patch Description: Binary data sepgsql-v9.3-test-script-fixup.v1.patch Description: Binary data sepgsql-v9.3-function-execute-permission.v4.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Inconsistent DB data in Streaming Replication
On Mon, Apr 8, 2013 at 6:50 PM, Shaun Thomas wrote: > On 04/08/2013 05:34 AM, Samrat Revagade wrote: > >> One solution to avoid this situation is have the master send WAL >> records to standby and wait for ACK from standby committing WAL files >> to disk and only after that commit data page related to this >> transaction on master. > > > Isn't this basically what synchronous replication does in PG 9.1+? Not exactly. Sync-rep ensures that commit success is not sent to the client before a synchronous replica acks the commit record. What Samrat is proposing here is that WAL is not flushed to the OS before it is acked by a synchronous replica so recovery won't go past the timeline change made in failover, making it necessary to take a new base backup to resync with the new master. I seem to remember this being discussed when sync rep was committed. I don't recall if the idea was discarded only on performance grounds or whether there were other issues too. Thinking about it now it, the requirement is that after crash and failover to a sync replica we should be able to reuse the datadir to replicate from the new master without consistency. We should be able to achieve that by ensuring that we don't write out pages until we have received an ack from the sync replica and that we check for possible timeline switches before recovering local WAL. For the first, it seems to me that it should be enough to rework the updating of XlogCtl->LogwrtResult.Flush so it accounts for the sync replica. For the second part, I think Heikkis work on enabling timeline switches over streaming connections already ensure this (I haven't checked it out in detail), but if not, shouldn't be too hard to add. Regards, Ants Aasma -- Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUGS] BUG #8043: 9.2.4 doesn't open WAL files from archive, only looks in pg_xlog
On Apr 6, 2013, at 1:24 PM, Jeff Janes wrote: > On Sat, Apr 6, 2013 at 1:24 AM, Heikki Linnakangas > wrote: > >> >> Incidentally, I bumped into another custom backup script just a few weeks >> back that also excluded backup_label. I don't know what the author was >> thinking when he wrote that, but it seems to be a surprisingly common >> mistake. Maybe it's the "label" in the filename that makes people think >> it's not important. > > > > I think part of it is the name "label', and part of it is that this file is > similar to and hence easily confused with the .history files, which (as far > as I know) truly are there only for human information and not for system > operation. While the backup_label file was included in all base backups by the custom backup script, it was not present in the cluster directory when starting PG 9.2.4. Because the data directory was synced from the base backup without backup_label. Sure enough, including backup_label in the data directory fixes this for me. I think these custom scripts were written around 8.1. That the base backups have worked all this time (each base backup is tested) is probably because the backup script, immediately after calling pg_start_backup(), rsyncs pg_control before rsync'ing everything else in the data directory. Although, it seems this quirk allows for problems if WAL files are generated between the time pg_start_backup() is called and pg_control is rsync'd. >> Perhaps we should improve the documentation to make it more explicit that >> backup_label must be included in the backup. The docs already say that, >> though, so I suspect that people making this mistake have not read the docs >> very carefully anyway. >> > > > I don't think the docs are very clear on that. They say "This file will of > course be archived as a part of your backup dump file", but "will be" does > not imply "must be". Elsewhere it emphasizes that the label you gave to > pg_start_backup is written into the file, but doesn't really say what the > file itself is there for. To me it seems to imply that the file is there > for your convenience, to hold that label, and not as a critical part of the > system. > > Patch attached, which I hope can be back-patched. I'll also add it to > commitfest-Next. > > Cheers, > > Jeff > I think this documentation update would be helpful. Thank you for your help, - Jeff -- Jeff Bohmer | Corporate Technology Manager | VisionLink, Inc. First National Center | 3101 Iris Avenue, Suite 240 | Boulder CO, 80301 Office 303.402.0170 x121 Other ways to stay in touch - Blog | Twitter | Facebook | LinkedIn | Web This message and any attachments may contain information that is privileged, confidential or exempt from disclosure under applicable law or agreement. If you have received this message in error, please reply and delete the message and any attachments without opening the attachment. Thank you. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Inconsistent DB data in Streaming Replication
Samrat Revagade writes: > We have been trying to figure out possible solutions to the following > problem in streaming replication Consider following scenario: > If master receives commit command, it writes and flushes commit WAL records > to the disk, It also writes and flushes data page related to this > transaction. > The master then sends WAL records to standby up to the commit WAL record. > But before sending these records if failover happens then, old master is > ahead of standby which is now the new master in terms of DB data leading > to inconsistent data . I don't exactly see the problem ... unless you're imagining that master and slave share the same data storage or something like that. That's not going to work for a ton of reasons besides this one. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Inconsistent DB data in Streaming Replication
On 04/08/2013 05:34 AM, Samrat Revagade wrote: One solution to avoid this situation is have the master send WAL records to standby and wait for ACK from standby committing WAL files to disk and only after that commit data page related to this transaction on master. Isn't this basically what synchronous replication does in PG 9.1+? -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604 312-676-8870 stho...@optionshouse.com __ See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Inconsistent DB data in Streaming Replication
Hello, We have been trying to figure out possible solutions to the following problem in streaming replication Consider following scenario: If master receives commit command, it writes and flushes commit WAL records to the disk, It also writes and flushes data page related to this transaction. The master then sends WAL records to standby up to the commit WAL record. But before sending these records if failover happens then, old master is ahead of standby which is now the new master in terms of DB data leading to inconsistent data . One solution to avoid this situation is have the master send WAL records to standby and wait for ACK from standby committing WAL files to disk and only after that commit data page related to this transaction on master. The main drawback would be increased wait time for the client due to extra round trip to standby before master sends ACK to client. Are there any other issues with this approach? Thank you, Samrat
Re: [HACKERS] unused code in float8_to_char , formatting.c ?
On 7 Apr 2013, at 05:14, Robert Haas wrote: > On Thu, Apr 4, 2013 at 6:47 PM, Greg Jaskiewicz wrote: >> Looking around the code Today, one of my helpful tools detected this dead >> code. >> As far as I can see, it is actually unused call to strlen() in formatting.c, >> float8_to_char(). > > I poked at this a little and suggest the following somewhat more > extensive cleanup. > > It seems to me that there are a bunch of these functions where len is > unconditionally initialized in NUM_TOCHAR_prepare and then used there. > Similarly in NUM_TOCHAR_cleanup. And then there's a chunk of each > individual function that does it a third time. Rather than use the > same variable in all three places, I've moved the variable > declarations to the innermost possible scope. Doing that revealed a > bunch of other, similar places where we can get rid of strlen() calls. > > Does this version seem like a good idea? Looks more extensive :-) On the quick glance, without a lot of testing it looks ok. But the lack of test cases stressing all different cases in that file, makes it impossible to say that there's no regressions. Personally I always feel uneasy making extensive changes in complicated code like this, without any integrated test case(s). -- GJ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_dump with postgis extension dumps rules separately
Joe Conway writes: > OK, maybe I'll try to take a look in the meantime. That would be awesome :) > Did you have any comment on the other pg_dump patch (reviewed by Vibhor)? This whole extension table filtering and dumping is more in Tom's realm, so I guess that if you want to have another pair of eyes on your patch before commit'ing it yourself, you will need him to have a look. That said, I didn't spot anything obvious that I want to report myself, so I would label it "ready for commiter". Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unrecognized type error (postgres 9.1.4)
On Mon, Apr 8, 2013 at 11:44 AM, Rodrigo Barboza wrote: > > > > On Mon, Apr 8, 2013 at 11:27 AM, Rodrigo Barboza > wrote: > >> >> >> On Mon, Apr 8, 2013 at 11:25 AM, Tom Lane wrote: >> >>> Rodrigo Barboza writes: >>> > UPDATE tm32 SET a = a + 1 WHERE a > $i; >>> > ERROR: unsupported type: 202886 >>> >>> I'm betting that's coming from scalargtsel, which doesn't know anything >>> about your type, but you've nominated it to be the selectivity function >>> for ">" anyway. >>> >>> /* >>> * Can't get here unless someone tries to use >>> scalarltsel/scalargtsel on >>> * an operator with one numeric and one non-numeric operand. >>> */ >>> elog(ERROR, "unsupported type: %u", typid); >>> >>> regards, tom lane >>> >> >> >> Yes, I found it in the code, but I followed the example from the postgres >> documentation that uses this function. >> And why does it work sometimes? Why not other times? >> >> > > Here is a very simple case and weird behavior. I select * from a table and > returns 4 entries. > But when I run with a filter the error raises and crazy values are printed > from the params. > > Here is my funcitons where I compare the values: > > typedef uint32_t TmUInt32; > > static int > tmuint32_int32_abs_cmp_internal(TmUInt32 a, int32_t b) > { > int ret; > elog(NOTICE, "funcao:%s linha:%d\n", *_FUNCTION_*, *_LINE_*); > if (a < b) ret = -1; > else if (a > b) ret = 1; > else ret = 0; > elog(NOTICE, "funcao:%s linha:%d, ret: %d a: %u\n", *_FUNCTION_*, *_LINE_*, > ret, a); > return ret; > } > > PG_FUNCTION_INFO_V1(tmuint32_int32_abs_gt); > > Datum > tmuint32_int32_abs_gt(PG_FUNCTION_ARGS) > { > TmUInt32 *param1; > int32_t param2; > elog(NOTICE, "funcao:%s linha:%d\n", *_FUNCTION_*, *_LINE_*); > > if(PG_ARGISNULL(0) || PG_ARGISNULL(1)) PG_RETURN_NULL(); > > param1 = (TmUInt32 *) PG_GETARG_POINTER(0); > param2 = DatumGetInt32(PG_GETARG_DATUM(1)); > > elog(NOTICE, "funcao:%s linha:%d param1: %u, param2: %d\n", *_FUNCTION_*, > *_LINE_*, *param1, param2); > PG_RETURN_BOOL(tmuint32_int32_abs_cmp_internal(*param1, param2) > 0); > } > > > And here is the simple test. > > -- SIMPLE QUERY > select * from a; > > NOTICE: funcao:tmuint32_out linha:191 > > NOTICE: funcao:tmuint32_out linha:191 > > NOTICE: funcao:tmuint32_out linha:191 > > NOTICE: funcao:tmuint32_out linha:191 > > a > --- > 0 > 1 > 2 > 3 > (4 rows) > > > _ > > -- QUERY WHITH FILTER > select * from a where a > 1; > > NOTICE: funcao:tmuint32_int32_abs_gt linha:1296 > > NOTICE: funcao:tmuint32_int32_abs_gt linha:1303 param1: 0, param2: 1 > > NOTICE: funcao:tmuint32_int32_abs_cmp_internal linha:742 > > NOTICE: funcao:tmuint32_int32_abs_cmp_internal linha:746, ret: -1 a: 0 > > NOTICE: funcao:tmuint32_int32_abs_gt linha:1296 > > NOTICE: funcao:tmuint32_int32_abs_gt linha:1303 param1: 99, param2: 1 > > NOTICE: funcao:tmuint32_int32_abs_cmp_internal linha:742 > > NOTICE: funcao:tmuint32_int32_abs_cmp_internal linha:746, ret: 1 a: 99 > > NOTICE: funcao:tmuint32_int32_abs_gt linha:1296 > > NOTICE: funcao:tmuint32_int32_abs_gt linha:1303 param1: 50, param2: 1 > > NOTICE: funcao:tmuint32_int32_abs_cmp_internal linha:742 > > NOTICE: funcao:tmuint32_int32_abs_cmp_internal linha:746, ret: 1 a: 50 > > NOTICE: funcao:tmuint32_int32_abs_gt linha:1296 > > NOTICE: funcao:tmuint32_int32_abs_gt linha:1303 param1: 24, param2: 1 > > NOTICE: funcao:tmuint32_int32_abs_cmp_internal linha:742 > > NOTICE: funcao:tmuint32_int32_abs_cmp_internal linha:746, ret: 1 a: 24 > > NOTICE: funcao:tmuint32_int32_abs_gt linha:1296 > > NOTICE: funcao:tmuint32_int32_abs_gt linha:1303 param1: 12, param2: 1 > > NOTICE: funcao:tmuint32_int32_abs_cmp_internal linha:742 > > NOTICE: funcao:tmuint32_int32_abs_cmp_internal linha:746, ret: 1 a: 12 > > NOTICE: funcao:tmuint32_int32_abs_gt linha:1296 > > NOTICE: funcao:tmuint32_int32_abs_gt linha:1303 param1: 6, param2: 1 > > NOTICE: funcao:tmuint32_int32_abs_cmp_internal linha:742 > > NOTICE: funcao:tmuint32_int32_abs_cmp_internal linha:746, ret: 1 a: 6 > > NOTICE: funcao:tmuint32_int32_abs_gt linha:1296 > > NOTICE: funcao:tmuint32_int32_abs_gt linha:1303 param1: 2, param2: 1 > > NOTICE: funcao:tmuint32_int32_abs_cmp_internal linha:742 > > NOTICE: funcao:tmuint32_int32_abs_cmp_internal linha:746, ret: 1 a: 2 > > NOTICE: funcao:tmuint32_int32_abs_gt linha:1296 > > NOTICE: funcao:tmuint32_int32_abs_gt linha:1303 param1: 1, param2: 1 > > NOTICE: funcao:tmuint32_int32_abs_cmp_internal linha:742 > > NOTICE: funcao:tmuint32_int32_abs_cmp_internal linha:746, ret: 0 a: 1 > > ERROR: unsupported type: 220200 > > I found that the problem is in the highlithed line. I'm getting the wrong value from param1. But why this behavior? PG_FUNCTION_INFO_V1(tmuint32_int32_abs_gt); Datum tmuint32_int32_abs_gt(PG_FUNCTION_ARGS) { TmUInt32 *param1; int32_t param2; elog(NOTICE, "funcao:%s linha:%d\n", *_FUNCTION_*, *
Re: [HACKERS] pg_dump with postgis extension dumps rules separately
On 04/08/2013 07:42 AM, Dimitri Fontaine wrote: > Joe Conway writes: >> Shouldn't that CREATE RULE be implicitly part of the CREATE EXTENSION? > > Yes. It's a bug, been reported before, it's on my todo list. I have > arranged some time to care about it while in beta, I won't be able to > have at it before then… OK, maybe I'll try to take a look in the meantime. Did you have any comment on the other pg_dump patch (reviewed by Vibhor)? Thanks, Joe -- Joe Conway credativ LLC: http://www.credativ.us Linux, PostgreSQL, and general Open Source Training, Service, Consulting, & 24x7 Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Get rid of USE_WIDE_UPPER_LOWER dependency in trigram constructi
Tom Lane writes: > How exactly would you know whether the previous installation was built > without HAVE_WCSTOMBS/HAVE_TOWLOWER? That's not exposed anywhere > reliable. And it's not out of the question that somebody upgrading to > a newer PG version might upgrade his OS too, so I would not think that > checking what configure says now is trustworthy. Oh, it's even worse than I imagined then. If you don't know where to find the information, maybe the release notes should just propose to REINDEX in any case? > In general, though, this suggestion seems about two orders of magnitude > more work than the particular case justifies. We might want to think > about something like it for future releases, but it's not likely to > happen for 9.3. Yeah, I've been thinking that it's the case after sending the email proposal. It might just be some accumulative effect that leads me to that proposal. I still think that a tool able to asses if you're concerned by "reindex gist indexes on that datatype" or suchlike would be really good to have. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unrecognized type error (postgres 9.1.4)
On Mon, Apr 8, 2013 at 11:27 AM, Rodrigo Barboza wrote: > > > On Mon, Apr 8, 2013 at 11:25 AM, Tom Lane wrote: > >> Rodrigo Barboza writes: >> > UPDATE tm32 SET a = a + 1 WHERE a > $i; >> > ERROR: unsupported type: 202886 >> >> I'm betting that's coming from scalargtsel, which doesn't know anything >> about your type, but you've nominated it to be the selectivity function >> for ">" anyway. >> >> /* >> * Can't get here unless someone tries to use >> scalarltsel/scalargtsel on >> * an operator with one numeric and one non-numeric operand. >> */ >> elog(ERROR, "unsupported type: %u", typid); >> >> regards, tom lane >> > > > Yes, I found it in the code, but I followed the example from the postgres > documentation that uses this function. > And why does it work sometimes? Why not other times? > > Here is a very simple case and weird behavior. I select * from a table and returns 4 entries. But when I run with a filter the error raises and crazy values are printed from the params. Here is my funcitons where I compare the values: typedef uint32_t TmUInt32; static int tmuint32_int32_abs_cmp_internal(TmUInt32 a, int32_t b) { int ret; elog(NOTICE, "funcao:%s linha:%d\n", *_FUNCTION_*, *_LINE_*); if (a < b) ret = -1; else if (a > b) ret = 1; else ret = 0; elog(NOTICE, "funcao:%s linha:%d, ret: %d a: %u\n", *_FUNCTION_*, *_LINE_*, ret, a); return ret; } PG_FUNCTION_INFO_V1(tmuint32_int32_abs_gt); Datum tmuint32_int32_abs_gt(PG_FUNCTION_ARGS) { TmUInt32 *param1; int32_t param2; elog(NOTICE, "funcao:%s linha:%d\n", *_FUNCTION_*, *_LINE_*); if(PG_ARGISNULL(0) || PG_ARGISNULL(1)) PG_RETURN_NULL(); param1 = (TmUInt32 *) PG_GETARG_POINTER(0); param2 = DatumGetInt32(PG_GETARG_DATUM(1)); elog(NOTICE, "funcao:%s linha:%d param1: %u, param2: %d\n", *_FUNCTION_*, * _LINE_*, *param1, param2); PG_RETURN_BOOL(tmuint32_int32_abs_cmp_internal(*param1, param2) > 0); } And here is the simple test. -- SIMPLE QUERY select * from a; NOTICE: funcao:tmuint32_out linha:191 NOTICE: funcao:tmuint32_out linha:191 NOTICE: funcao:tmuint32_out linha:191 NOTICE: funcao:tmuint32_out linha:191 a --- 0 1 2 3 (4 rows) _ -- QUERY WHITH FILTER select * from a where a > 1; NOTICE: funcao:tmuint32_int32_abs_gt linha:1296 NOTICE: funcao:tmuint32_int32_abs_gt linha:1303 param1: 0, param2: 1 NOTICE: funcao:tmuint32_int32_abs_cmp_internal linha:742 NOTICE: funcao:tmuint32_int32_abs_cmp_internal linha:746, ret: -1 a: 0 NOTICE: funcao:tmuint32_int32_abs_gt linha:1296 NOTICE: funcao:tmuint32_int32_abs_gt linha:1303 param1: 99, param2: 1 NOTICE: funcao:tmuint32_int32_abs_cmp_internal linha:742 NOTICE: funcao:tmuint32_int32_abs_cmp_internal linha:746, ret: 1 a: 99 NOTICE: funcao:tmuint32_int32_abs_gt linha:1296 NOTICE: funcao:tmuint32_int32_abs_gt linha:1303 param1: 50, param2: 1 NOTICE: funcao:tmuint32_int32_abs_cmp_internal linha:742 NOTICE: funcao:tmuint32_int32_abs_cmp_internal linha:746, ret: 1 a: 50 NOTICE: funcao:tmuint32_int32_abs_gt linha:1296 NOTICE: funcao:tmuint32_int32_abs_gt linha:1303 param1: 24, param2: 1 NOTICE: funcao:tmuint32_int32_abs_cmp_internal linha:742 NOTICE: funcao:tmuint32_int32_abs_cmp_internal linha:746, ret: 1 a: 24 NOTICE: funcao:tmuint32_int32_abs_gt linha:1296 NOTICE: funcao:tmuint32_int32_abs_gt linha:1303 param1: 12, param2: 1 NOTICE: funcao:tmuint32_int32_abs_cmp_internal linha:742 NOTICE: funcao:tmuint32_int32_abs_cmp_internal linha:746, ret: 1 a: 12 NOTICE: funcao:tmuint32_int32_abs_gt linha:1296 NOTICE: funcao:tmuint32_int32_abs_gt linha:1303 param1: 6, param2: 1 NOTICE: funcao:tmuint32_int32_abs_cmp_internal linha:742 NOTICE: funcao:tmuint32_int32_abs_cmp_internal linha:746, ret: 1 a: 6 NOTICE: funcao:tmuint32_int32_abs_gt linha:1296 NOTICE: funcao:tmuint32_int32_abs_gt linha:1303 param1: 2, param2: 1 NOTICE: funcao:tmuint32_int32_abs_cmp_internal linha:742 NOTICE: funcao:tmuint32_int32_abs_cmp_internal linha:746, ret: 1 a: 2 NOTICE: funcao:tmuint32_int32_abs_gt linha:1296 NOTICE: funcao:tmuint32_int32_abs_gt linha:1303 param1: 1, param2: 1 NOTICE: funcao:tmuint32_int32_abs_cmp_internal linha:742 NOTICE: funcao:tmuint32_int32_abs_cmp_internal linha:746, ret: 0 a: 1 ERROR: unsupported type: 220200
Re: [HACKERS] pg_dump with postgis extension dumps rules separately
Joe Conway writes: > Shouldn't that CREATE RULE be implicitly part of the CREATE EXTENSION? Yes. It's a bug, been reported before, it's on my todo list. I have arranged some time to care about it while in beta, I won't be able to have at it before then… Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [PATCH] Exorcise "zero-dimensional" arrays (Was: Re: Should array_length() Return NULL)
On Thu, Apr 4, 2013 at 4:10 PM, Merlin Moncure wrote: > The only reasonable answer for this (a provably used, non-security, > non-standards violating, non-gross functionality breakage case) is > *zero*. Our historically cavalier attitude towards compatibility > breakage has been an immense disservice to our users and encourages > very bad upgrade habits and is, IMNSHO, embarrassing. > The flip side of this is that the ability to make improvements freely is one of the biggest strengths of free software over commercial software. Oracle and Microsoft invest *tons* of money in maintaining huge libraries of backward compatibility code and drastically limits their ability to keep making improvements. -- greg
Re: [HACKERS] [COMMITTERS] pgsql: Get rid of USE_WIDE_UPPER_LOWER dependency in trigram constructi
Dimitri Fontaine writes: > Tom Lane writes: >> If there is anybody still using Postgres on machines without wcstombs() or >> towlower(), and they have non-ASCII data indexed by pg_trgm, they'll need >> to REINDEX those indexes after pg_upgrade to 9.3, else searches may fail >> incorrectly. It seems likely that there are no such installations, though. > Those conditions seem just complex enough to require a test script that > will check that for you. What if we created a new binary responsible for > auto checking all those release-note items that are possible to machine > check, then issue a WARNING containing the URL to the release notes you > should be reading, and a SQL script (ala pg_upgrade) to run after > upgrade? How exactly would you know whether the previous installation was built without HAVE_WCSTOMBS/HAVE_TOWLOWER? That's not exposed anywhere reliable. And it's not out of the question that somebody upgrading to a newer PG version might upgrade his OS too, so I would not think that checking what configure says now is trustworthy. In general, though, this suggestion seems about two orders of magnitude more work than the particular case justifies. We might want to think about something like it for future releases, but it's not likely to happen for 9.3. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unrecognized type error (postgres 9.1.4)
On Mon, Apr 8, 2013 at 11:25 AM, Tom Lane wrote: > Rodrigo Barboza writes: > > UPDATE tm32 SET a = a + 1 WHERE a > $i; > > ERROR: unsupported type: 202886 > > I'm betting that's coming from scalargtsel, which doesn't know anything > about your type, but you've nominated it to be the selectivity function > for ">" anyway. > > /* > * Can't get here unless someone tries to use > scalarltsel/scalargtsel on > * an operator with one numeric and one non-numeric operand. > */ > elog(ERROR, "unsupported type: %u", typid); > > regards, tom lane > Yes, I found it in the code, but I followed the example from the postgres documentation that uses this function. And why does it work sometimes? Why not other times?
Re: [HACKERS] Unrecognized type error (postgres 9.1.4)
Rodrigo Barboza writes: > UPDATE tm32 SET a = a + 1 WHERE a > $i; > ERROR: unsupported type: 202886 I'm betting that's coming from scalargtsel, which doesn't know anything about your type, but you've nominated it to be the selectivity function for ">" anyway. /* * Can't get here unless someone tries to use scalarltsel/scalargtsel on * an operator with one numeric and one non-numeric operand. */ elog(ERROR, "unsupported type: %u", typid); regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [PATCH] Exorcise "zero-dimensional" arrays (Was: Re: [HACKERS] Should array_length() Return NULL)
On 8 April 2013 16:09, Tom Lane wrote: > Brendan Jurd writes: >> On the specific issue of CARDINALITY, I guess we need to decide >> whether we are going to pretend that our array/matrix thing is >> actually nested. I first argued that we should not. But it occurred >> to me that if we do pretend, it would at least leave the door ajar if >> we want to do something to make our arrays more nest-like in future, >> without disrupting the behaviour of CARDINALITY. > > This seems to be exactly the same uncertainty that we couldn't resolve > back in the 8.4 devel cycle, for exactly the same reasons. I don't see > that the discussion has moved forward any :-( > I had a poke around in the archives, and it seems to me that the major argument that was advanced in favour of making cardinality() return the total number of items was ... we don't have anything that does that yet. That's why I'm proposing we add array_num_items as well -- I do think there should be a function for this, I just don't think cardinality fits the bill. Cheers, BJ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] commit dfda6ebaec67 versus wal_keep_segments
On 03.04.2013 22:50, Jeff Janes wrote: On Wed, Apr 3, 2013 at 11:14 AM, Heikki Linnakangas wrote: On 03.04.2013 18:58, Jeff Janes wrote: On Tue, Apr 2, 2013 at 10:08 PM, Jeff Janes wrote: This commit introduced a problem with wal_keep_segments: commit dfda6ebaec6763090fb78b458a979b**558c50b39b The problem seems to be that the underflow warned about is happening, because the check to guard it was checking the wrong thing. However, I don't really understand KeepLogSeg. It seems like segno, and hence recptr, don't actually serve any purpose. Hmm, the check is actually correct, but the assignment in the else-branch isn't. The idea of KeepLogSeg is to calculate recptr - wal_keep_segments, and assign that to *logSegNo. But only if *logSegNo is not already< than the calculated value. Does the attached look correct to you? Let me describe what I think is going on. My description is "On start, recptr is the redo location of the just-completed checkpoint, and logSegNo is the redo location segment of the checkpoint before that one. We want to keep the previous-checkpoint redo location, and we also want to keep wal_keep_segments before the current-checkpoint redo location, so we take whichever is earlier." If my understanding is now correct, then I think your patch looks correct. (Also, applying it fixed the problem I was having.) Ok, thanks, applied. Why do we keep wal_keep_segments before the just-finished checkpoint, rather than keeping that many before the previous checkpoint? I seems like it would be more intuitive (to the DBA) for that parameter to mean "keep this many more segments than you otherwise would". I'm not proposing we change it, I'm just curious about why it is done that way. It feels more intuitive to me the way it is. wal_keep_log_segments means "make sure there are always this many old WAL segments available in the server, regardless of any other settings". If you have a standby, it means that you don't need a new base backup as long as you don't fall behind the master by more than wal_keep_segments segments. On 03.04.2013 21:33, Alvaro Herrera wrote: "by by" Fixed, thanks. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: index support for regexp search
On Mon, Apr 8, 2013 at 9:28 AM, Tom Lane wrote: > Alexander Korotkov writes: > > [ trgm-regexp-0.15.patch.gz ] > > I spent the weekend hacking on this, making a number of bug fixes and a > whole lot of cosmetic changes. I think there are large parts of this > that are in committable shape now, but I still find the actual graph > transformation logic to be mostly unintelligible. I think what's most > obscure is the distinction between the arcs list and the keys list of > each state in the expanded graph. I get the impression that the > general idea is for the arcs to represent exactly-known transitions > while the keys represent imprecisely-known transitions ... but there > seems to be at least some leakage between those categories. Could > you write down a specification for what's supposed to be happening > there? > Here is my try to specify it. At first some notions. I know, they are already in the comments, but just in order to put it together. Extended color - any color of source CNFA or one of two special values: 1) Unknown color - may represent any character either alphanumeric or non-alphanumeric. 2) Blank color - may represent any non-alphanumeric character Prefix is extended colors of last two characters read by CNFA. Key is pair of CNFA state and prefix. So, key is a extended state which is containing additional information which can influence further trigrams. So, if you are in some key and traverse some CNFA arc then you moves info another key. But there are two possible cases (or, sometimes, both of them): 1) Your move from one key into another necessary means read of some trigram. Then you create new arc labeled with that trigram. 2) You move into another key, but you doesn't necessary read an useful trigram. For example, arc of source CNFA is labeled by "unextractable" color. Then you add new key into "keys" array. And outgoing arcs from this key will also be processed similarly to source key. Therefore "keys" array is a set of keys which are achievable from "stateKey" without reading of useful trigram. We could get rid of "keys" array and produce some "empty arcs" in the second case. You can imagine that "keys" array is set of keys which are achivable by "empty arcs" from "stateKey". States connected with "empty arcs" could be merged on the next stage. However, the reason of having separated addKeys stage is optimization. In addKey function more generals keys absorb less general ones. In many cases resulting graph becomes much simplier because of this. For example, if your regex is not prefix, then engine puts self-referencing arcs of all possible colors to initial state. Straight-forward processing of this could produce enormous output graph. I had similar situation in early version of patch where keys didn't absorb earch other. -- With best regards, Alexander Korotkov.
Re: [HACKERS] Patch for removng unused targets
> From: Tom Lane [mailto:t...@sss.pgh.pa.us] > Alexander Korotkov writes: > > On Mon, Dec 3, 2012 at 8:31 PM, Tom Lane wrote: > >> But having said that, I'm wondering (without having read the patch) > >> why you need anything more than the existing "resjunk" field. > > > Actually, I don't know all the cases when "resjunk" flag is set. Is it > > reliable to decide target to be used only for "ORDER BY" if it's "resjunk" > > and neither system or used in grouping? If it's so or there are some other > > cases which are easy to determine then I'll remove "resorderbyonly" flag. > > resjunk means that the target is not supposed to be output by the query. > Since it's there at all, it's presumably referenced by ORDER BY or GROUP > BY or DISTINCT ON, but the meaning of the flag doesn't depend on that. > > What you would need to do is verify that the target is resjunk and not > used in any clause besides ORDER BY. I have not read your patch, but > I rather imagine that what you've got now is that the parser checks this > and sets the new flag for consumption far downstream. Why not just make > the same check in the planner? I've created a patch using this approach. Please find attached the patch. > A more invasive, but possibly cleaner in the long run, approach is to > strip all resjunk targets from the query's tlist at the start of > planning and only put them back if needed. > > BTW, when I looked at this a couple years ago, it seemed like the major > problem was that the planner assumes that all plans for the query should > emit the same tlist, and thus that tlist eval cost isn't a > distinguishing factor. Breaking that assumption seemed to require > rather significant refactoring. I never found the time to try to > actually do it. Such an approach would improve code readability, but I'm not sure it's worth the work for this optimization, though I think I'm missing something. Thanks, Best regards, Etsuro Fujita unused-targets-2.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] corrupt pages detected by enabling checksums
On 6 April 2013 15:44, Andres Freund wrote: > > * In xlog_redo, it seemed slightly awkward to call XLogRecGetData twice. > > Merely a matter of preference but I thought I would mention it. > > Youre absolutely right, memcpy should have gotten passed 'data', not > XLogRecGetData(). Applied, with this as the only code change. Thanks everybody for good research and coding and fast testing. We're in good shape now. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: [HACKERS] Enabling Checksums
On 6 April 2013 08:40, Heikki Linnakangas wrote: > AFAICS that could be easily avoided by doing a simple PageGetLSN() like we > used to, if checksums are not enabled. In XLogCheckBuffer: > > /* >> * XXX We assume page LSN is first data on *every* page that can >> be passed >> * to XLogInsert, whether it otherwise has the standard page >> layout or >> * not. We don't need the buffer header lock for PageGetLSN >> because we >> * have exclusive lock on the page and/or the relation. >> */ >> *lsn = BufferGetLSNAtomic(rdata->**buffer); >> > > Also, the second sentence in the above comment is completely bogus now. Both points addressed on separate commits. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services