[HACKERS] TODO links broken?
Hello, Many of the links in the TODO wiki page result in a "page not found" error. Is this page up-to-date? Can anything be inferred about the status of these items from the broken link? Thanks. http://wiki.postgresql.org/wiki/Todo
Re: [HACKERS] ancient sequence point bug
-Original Message- From: pgsql-hackers-ow...@postgresql.org [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Tom Lane Sent: Tuesday, April 16, 2013 7:52 PM To: Peter Eisentraut Cc: pgsql-hackers@postgresql.org Subject: Re: [HACKERS] ancient sequence point bug >Peter Eisentraut writes: >> This code in bootstrap.c contains a sequence point violation (or >> whatever that is really called): > >> while ((tup = heap_getnext(scan, ForwardScanDirection)) != NULL) >> { >> (*app)->am_oid = HeapTupleGetOid(tup); >> memmove((char *) &(*app++)->am_typ, >> (char *) GETSTRUCT(tup), >> sizeof((*app)->am_typ)); >> } > >What exactly is the violation? sizeof() is a side-effect-free compile time >constant, and the first value to be passed to memmove seems perfectly well >defined. Unless it is C99 and the object is a VLA, which must be evaluated at runtime. I guess that (*app)->am_typ is not a VLA, especially since PostgreSQL does not require C99 to compile. >I grant that this is not terribly good coding style, but I don't believe >there's an actual risk here. I agree that there is no risk in the sizeof operator. According to my understanding, even this is legal: unsigned long long *p = 0; size_t stupid = sizeof (*p); printf("size of a long long is %u\n", (unsigned) stupid); But I also guess that most compilers will have a cow when scanning it. >> In commit 1aebc361, another place in the same file was fixed like this: > >Well, I don't really remember why I did that twelve years ago, but seeing that >there are a number of purely-cosmetic changes in that commit, I'm thinking it >was only meant to be cosmetic. > >I certainly have no objection to making this code look more like that code, >I'm just not seeing that it's a bug. You are right. It's not a bug. However, I would not be surprised if GCC or CLANG would shriek at the higher warning levels, usually not being able to apply artificial intelligence to solve problems that seem simple to humans. In the interest of "quiet compiles" equivalent code that does not produce a warning seems like a good idea. IMO-YMMV. > regards, tom lane > > > -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ancient sequence point bug
Peter Eisentraut writes: > This code in bootstrap.c contains a sequence point violation (or > whatever that is really called): > while ((tup = heap_getnext(scan, ForwardScanDirection)) != NULL) > { > (*app)->am_oid = HeapTupleGetOid(tup); > memmove((char *) &(*app++)->am_typ, > (char *) GETSTRUCT(tup), > sizeof((*app)->am_typ)); > } What exactly is the violation? sizeof() is a side-effect-free compile time constant, and the first value to be passed to memmove seems perfectly well defined. I grant that this is not terribly good coding style, but I don't believe there's an actual risk here. > In commit 1aebc361, another place in the same file was fixed like this: Well, I don't really remember why I did that twelve years ago, but seeing that there are a number of purely-cosmetic changes in that commit, I'm thinking it was only meant to be cosmetic. I certainly have no objection to making this code look more like that code, I'm just not seeing that it's a bug. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] ancient sequence point bug
This code in bootstrap.c contains a sequence point violation (or whatever that is really called): while ((tup = heap_getnext(scan, ForwardScanDirection)) != NULL) { (*app)->am_oid = HeapTupleGetOid(tup); memmove((char *) &(*app++)->am_typ, (char *) GETSTRUCT(tup), sizeof((*app)->am_typ)); } In commit 1aebc361, another place in the same file was fixed like this: @@ -445,13 +455,13 @@ struct typmap while (HeapTupleIsValid(tup = heap_getnext(scan, 0))) { (*app)->am_oid = tup->t_data->t_oid; - memmove((char *) &(*app++)->am_typ, - (char *) GETSTRUCT(tup), - sizeof((*app)->am_typ)); + memcpy((char *) &(*app)->am_typ, + (char *) GETSTRUCT(tup), + sizeof((*app)->am_typ)); + app++; } heap_endscan(scan); heap_close(rel, NoLock); I think the same (move the app++, and change to memcpy (optionally)) should be done in the first case as well. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
On Apr16, 2013, at 23:41 , Ants Aasma wrote: > On Tue, Apr 16, 2013 at 11:20 PM, Florian Pflug wrote: >> On Apr13, 2013, at 17:14 , Ants Aasma wrote: >>> Based on current analysis, it is particularly good at detecting single >>> bit errors, as good at detecting burst errors as can be expected from >>> 16 bits and not horrible at detecting burst writes of zeroes. It is >>> quite bad at detecting multiple uncorrelated single bit errors and >>> extremely bad at detecting repeating patterns of errors in low order >>> bits. >> >> I've read the patch and tried to understand why it's that bad at >> detecting repeating patterns of errors in low order bits, and to see >> if there might be a way to fix that without too much of a performance >> impact. >> >> Here's what I gather the algorithm does: >> >> It treats the input data, a page of L bytes, as a Nx64 matrix V >> of 16-bit quantities (N = L/64, obviously). >> It then first computes (using two primes p (PRIME1) and q (PRIME2)) >> >>S = V[1,1]*p^63*q^63 + V[1,2]*p^63*q^62 + … + V[1,64]*p^63*q^0 >> + V[2,1]*p^62*q^63 + V[2,2]*p^62*q^62 + … + V[2,64]*p^62*q^0 >> + … >> + V[N,1]*p^0 *q^63 + V[N,2]*p^0 *q^62 + … + V[N,64]*p^0 *q^0 >> (mod 2^16) >> = sum V[i,j]*p^(64-i)*q^(64-j) >> >> Note that it does that by first computing the row-wise sums without >> the q^i coefficient, and then (in what the code calls the aggregation >> phase) combines those row-wise sums into a total, adding the q^i- >> coefficients along the way. >> >> The final hash value is then >> >> H = S * p + B * q mod 2^16 >> >> where B is a salt value intended to detect swapped pages (currently >> B is simply the page index) > > Great job analyzing the analytic form of the algorithm and sorry I you > had to do it instead finding it in the documentation. No problem, glad if I can help! >> This raises two question. First, why are there two primes? You could >> just as well using a single prime q and set p=q^64 mod 2^16. You then >> get >> S = sum V[i,j] * q^(64*(64-i) + (64-j) >>= sum V[i,j] * q^(4096 - 64*(i-1) - j) >> You get higher prime powers that way, but you can easily choose a prime >> that yields distinct values mod 2^16 for exponents up to 16383. Your >> PRIME2, for example, does. (It wraps around for 16384, i.e. >> PRIME2^16384 = 1 mod 2^16, but that's true for every possible prime since >> 16384 is the Carmichael function's value at 2^16) > > The experimental detection rate is about the same if we use a single > prime. But I think you have the analytical form wrong here. It should > be given q = p: > >S = sum V[i,j] * p^(64-i) * p^(64-j) > = sum V[i,j] * p^(64 - i + 64 - j) > = sum V[i,j] * p^(128 - i -j) Yeah, if you set q = p that's true. My suggestion was p=q^64 though... >> Second, why does it use addition instead of XOR? It seems that FNV >> usually XORs the terms together instead of adding them? > > Testing showed slightly better detection rate for adds. Intuitively I > think it's because the carry introduces some additional mixing. Hm, but OTOH it makes S linear in V, i.e. if you have two inputs V1,V2 and V = V1 + V2, then S = S1 + S2. Also, if V' = V*m, then S' = S*m. The second property is quite undesirable, I think. Assume all the V[i,j] are divisible by 2^k, i.e. have zeros at all bit positions 0..(k-1). Then, due to linearity, S is also divisible by 2^k, i.e. also has no ones before the k-th bit. This means, for example that if you hash values values which all have their lowest bit cleared, you get only 2^15 distinct hash values. If they all have the two lowest bits cleared, you get only 2^14 distinct values, and so on… Generally, linearity doesn't seem to be a property that one wants in a hash I think, so my suggestion is to stick to XOR. >> Here, btw, is a page on FNV hashing. It mentions a few rules for >> picking suitable primes >> >> http://www.isthe.com/chongo/tech/comp/fnv > > Unfortunately the rules don't apply here because of the hash size. Yeah :-(. I noticed that their 32-bit prime only has a single one outside the first 16 bits. Maybe we can take advantage of that and use a 32-bit state while still providing decent performance on machines without a 32-bit x 32-bit -> 32-bit multiply instruction? If we lived in an Intel-only world, I'd suggest going with a 32-bit state, since SSE4.1 support is *very* wide-spread already - the last CPUs without it came out over 5 years ago, I think. (Core2 and later support SSE4.1, and some later Core1 do too) But unfortunately things look bleak even for other x86 implementations - AMD support SSE4.1 only starting with Bulldozer, which came out 2011 or so I believe. Leaving the x86 realm, it seems that only ARM's NEON provides the instructions we'd need - AltiVec seems to be support only 16-bit multiplies, and from what some quick googling brought up, MIPS and SPARC SIMD instructions look no better.. OTOH, chances are that nobody will ever
[HACKERS] Change to pg_test_fsync output
I propose the attached patch to change pg_test_fsync's output from "microsecs" to "usecs", which is the designation we use in other places. Also remove parentheses, e.g. 1 * 16kB open_sync write8012.933 ops/sec 125 usecs/op 2 * 8kB open_sync writes 5399.901 ops/sec 185 usecs/op 4 * 4kB open_sync writes 3340.958 ops/sec 299 usecs/op 8 * 2kB open_sync writes 1903.621 ops/sec 525 usecs/op 16 * 1kB open_sync writes 1032.606 ops/sec 968 usecs/op This is new output for 9.3. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + diff --git a/contrib/pg_test_fsync/pg_test_fsync.c b/contrib/pg_test_fsync/pg_test_fsync.c new file mode 100644 index 7bc0d0e..b978d9e *** a/contrib/pg_test_fsync/pg_test_fsync.c --- b/contrib/pg_test_fsync/pg_test_fsync.c *** *** 25,31 #define LABEL_FORMAT "%-32s" #define NA_FORMAT "%18s" ! #define OPS_FORMAT "%9.3f ops/sec (%6.0f microsecs/op)" #define USECS_SEC 100 /* These are macros to avoid timing the function call overhead. */ --- 25,31 #define LABEL_FORMAT "%-32s" #define NA_FORMAT "%18s" ! #define OPS_FORMAT "%9.3f ops/sec %6.0f usecs/op" #define USECS_SEC 100 /* These are macros to avoid timing the function call overhead. */ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PROPOSAL: tracking aggregated numbers from pg_stat_database
On 4/13/13 12:44 PM, Tomas Vondra wrote: I'm currently struggling with (quite uncommon) deployments where databases are created/dropped regularly (not to mention tables and indexes), and it's surprisingly difficult to process such stats to get reasonable values. Yes, it's a pain. If you aggregate the table level data available now, you'll end up with some missing activity. Work done between the last snapshot and when the drop happened is gone right now, whereas your aggregated stats view would preserve that activity. The real fun is if the new table has the same name as the old one, which gives you all the negative value headaches a pg_stat_reset introduces too. It's possible to make a case for why database level aggregate statistics are useful. I don't know that yours is compelling enough to justify itself though, given that as you say this is an uncommon situation. In something similar to your setup I've just accepted that I have to save the snapshots into a table, will occasionally lose some mid-snapshot data, and I use a combination of correction updates to that table and SQL window functions to provide a useful view. It's a pain to do and I end up having to customize this approach for seemingly every install, but it can be made accurate enough for most uses. The gigantic hole in this area I was most interested in for 9.4 development was the lack of write statistics. Not having pg_stat_database.blks_write, pg_statio_user_tables.heap_blks_write or pg_statio_user_indexes.idx_blks_write is a crippling operations/planning limitation of the database. From that perspective, now isn't quite the right time to add more aggregation on top of that data, since the aggregation will make adding additional counters a bit more complicated. It's not a big difference, but thinking in that direction doesn't help your suggestion. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
On Tue, Apr 16, 2013 at 11:20 PM, Florian Pflug wrote: > On Apr13, 2013, at 17:14 , Ants Aasma wrote: >> Based on current analysis, it is particularly good at detecting single >> bit errors, as good at detecting burst errors as can be expected from >> 16 bits and not horrible at detecting burst writes of zeroes. It is >> quite bad at detecting multiple uncorrelated single bit errors and >> extremely bad at detecting repeating patterns of errors in low order >> bits. > > I've read the patch and tried to understand why it's that bad at > detecting repeating patterns of errors in low order bits, and to see > if there might be a way to fix that without too much of a performance > impact. > > Here's what I gather the algorithm does: > > It treats the input data, a page of L bytes, as a Nx64 matrix V > of 16-bit quantities (N = L/64, obviously). > It then first computes (using two primes p (PRIME1) and q (PRIME2)) > > S = V[1,1]*p^63*q^63 + V[1,2]*p^63*q^62 + … + V[1,64]*p^63*q^0 > + V[2,1]*p^62*q^63 + V[2,2]*p^62*q^62 + … + V[2,64]*p^62*q^0 > + … > + V[N,1]*p^0 *q^63 + V[N,2]*p^0 *q^62 + … + V[N,64]*p^0 *q^0 > (mod 2^16) > = sum V[i,j]*p^(64-i)*q^(64-j) > >Note that it does that by first computing the row-wise sums without >the q^i coefficient, and then (in what the code calls the aggregation >phase) combines those row-wise sums into a total, adding the q^i- >coefficients along the way. > >The final hash value is then > > H = S * p + B * q mod 2^16 > >where B is a salt value intended to detect swapped pages (currently >B is simply the page index) Great job analyzing the analytic form of the algorithm and sorry I you had to do it instead finding it in the documentation. > This raises two question. First, why are there two primes? You could > just as well using a single prime q and set p=q^64 mod 2^16. You then > get > S = sum V[i,j] * q^(64*(64-i) + (64-j) > = sum V[i,j] * q^(4096 - 64*(i-1) - j) > You get higher prime powers that way, but you can easily choose a prime > that yields distinct values mod 2^16 for exponents up to 16383. Your > PRIME2, for example, does. (It wraps around for 16384, i.e. > PRIME2^16384 = 1 mod 2^16, but that's true for every possible prime since > 16384 is the Carmichael function's value at 2^16) The experimental detection rate is about the same if we use a single prime. But I think you have the analytical form wrong here. It should be given q = p: S = sum V[i,j] * p^(64-i) * p^(64-j) = sum V[i,j] * p^(64 - i + 64 - j) = sum V[i,j] * p^(128 - i -j) This makes the whole matrix symmetric. While I can't think of any real world errors that would exhibit symmetry in this 64x64 matrix, it seemed better to me to avoid the issue altogether and use different primes. IIRC it helped a lot for the case of page[i] = i & 0xFF. > Second, why does it use addition instead of XOR? It seems that FNV > usually XORs the terms together instead of adding them? Testing showed slightly better detection rate for adds. Intuitively I think it's because the carry introduces some additional mixing. > Regarding the bad behaviour for multiple low-bit errors - can you > explain why it behaves badly in that case? I currently fail to see > why that would be. I *can* see that the lowest bit of the hash depends > only on the lowest bit of the input words, but as long as the lowest > bits of the input words also affect other bits of the hash, that shouldn't > matter. Which I think they do, but I might be missing something... Looks like you're right. I was somehow concentrating only on how the lowest bits depend on the input. > Here, btw, is a page on FNV hashing. It mentions a few rules for > picking suitable primes > > http://www.isthe.com/chongo/tech/comp/fnv Unfortunately the rules don't apply here because of the hash size. Thanks for your analysis. I will do my best to get this all into a document and will do some more analysis to see if I can come up with some kind of proof for the error cases. Regards, Ants Aasma -- Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] currval and DISCARD ALL
On Tue, Apr 16, 2013 at 6:09 PM, Tom Lane wrote: > > [...] > > Or, if you'd rather a more direct answer: wanting this sounds like > evidence of bad application design. Why is your app dependent on > getting failures from currval, and isn't there a better way to do it? > > The sequence cache (seqtab) is used per each backend, so if we use a connection pooler (like pgbouncer in session mode) between our app and postgres we can get a wrong value from 'currval' because the backend isn't completely clean. This isn't it a good reason to implement this feature? Regards, -- Fabrízio de Royes Mello Consultoria/Coaching PostgreSQL >> Blog sobre TI: http://fabriziomello.blogspot.com >> Perfil Linkedin: http://br.linkedin.com/in/fabriziomello >> Twitter: http://twitter.com/fabriziomello
Re: [HACKERS] Enabling Checksums
Ants Aasma writes: > On Tue, Apr 16, 2013 at 5:05 PM, Simon Riggs wrote: >> My only review comments are to ask for some explanation of the magic >> numbers... > The specific values used are mostly magic to me too. As mentioned in a > short sentence in the patch, the values are experimentally chosen, > guided by some intuition about what good values should look like. There actually is quite a lot of theory out there about this sort of thing. If we are inventing our own checksum function then We're Doing It Wrong. We should be adopting an existing, proven function. "Experimentally derived" is about the worst recommendation I can think of in this area. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] currval and DISCARD ALL
Bruce Momjian writes: > I think his point is why don't we clear currval() on DISCARD ALL? I > can't think of a good reason we don't. Because we'd have to invent a new suboperation DISCARD SEQUENCES, for one thing, in order to be consistent. I'd rather ask why it's important that we should throw away such state. It doesn't seem to me to be important enough to justify a new subcommand. Or, if you'd rather a more direct answer: wanting this sounds like evidence of bad application design. Why is your app dependent on getting failures from currval, and isn't there a better way to do it? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
On Tue, Apr 16, 2013 at 5:05 PM, Simon Riggs wrote: > On 9 April 2013 03:35, Ants Aasma wrote: >> On Fri, Apr 5, 2013 at 9:39 PM, Ants Aasma wrote: >>> Unless somebody tells me not to waste my time I'll go ahead and come >>> up with a workable patch by Monday. >> >> And here you go. I decided to be verbose with the comments as it's >> easier to delete a comment to write one. I also left in a huge jumble >> of macros to calculate the contents of a helper var during compile >> time. This can easily be replaced with the calculated values once we >> settle on specific parameters. >> >> Currently only x86-64 is implemented. 32bit x86 would be mostly a >> copy-and-paste job, replacing 64bit pointer registers with 32bit ones. >> For other platforms the simplest way would be to use a vectorizing >> compiler on the generic variant. -funroll-loops -ftree-vectorize is >> enough on gcc. >> >> Quick bench results on the worst case workload: >> master no checksums: tps = 15.561848 >> master with checksums: tps = 1.695450 >> simd checksums: tps = 14.602698 > > Numbers look very good on this. Well done. > > I support the direction of this, but I don't think I'm sufficiently > well qualified to verify that the code does what it should and/or fix > it if it breaks. If others want to see this happen you'll need to > pitch in. > > My only review comments are to ask for some explanation of the magic > numbers... > #define CSUM_PRIME1 0x49 > #define CSUM_PRIME2 0x986b > #define CSUM_TRUNC 65521 > > Where magic means a level of technology far above my own > understanding, and yet no (or not enough) code comments to assist me. The specific values used are mostly magic to me too. As mentioned in a short sentence in the patch, the values are experimentally chosen, guided by some intuition about what good values should look like. Basically the methodology for the choice was that I took all the pages from a 2.8GB test database, and then for each page introduced a bunch of common errors and observed how many errors were undetected. The main observations were: 1) the exact value of the primes doesn't really matter for detection efficiency. 2) values with a non-uniform distribution of zeroes and ones seem to work slightly better. I'll write up a readme of why the values are how they are and with some more explanation of the algorithm. Regards, Ants Aasma -- Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] currval and DISCARD ALL
On Tue, Apr 16, 2013 at 12:13:48PM -0700, Adrian Klaver wrote: > On 04/16/2013 08:07 AM, Nigel Heron wrote: > > > >On 04/15/2013 05:57 PM, Adrian Klaver wrote: > >>On 04/15/2013 02:42 PM, Nigel Heron wrote: > >>>Hi, > >>>is there a way to clear the session state of sequence values fetched by > >>>currval(regclass)? "DISCARD ALL" doesn't seem to do it. > >>> > > > >>Might want to take a look at: > >> > >>http://www.depesz.com/2012/12/02/what-is-the-point-of-bouncing/ > >> > >>for some hints on dealing with sequences and pgBouncer. > >> > > > >thanks, I read it (his blogs are always interesting!). I'm not disputing > >that calling currval() at the wrong time is a bad idea. > >I'm just wondering why DISCARD ALL clears everything but this? > > Well per the docs: > > http://www.postgresql.org/docs/9.2/interactive/sql-discard.html > > DISCARD ALL > > is equivalent to: > > SET SESSION AUTHORIZATION DEFAULT; > RESET ALL; > DEALLOCATE ALL; > CLOSE ALL; > UNLISTEN *; > SELECT pg_advisory_unlock_all(); > DISCARD PLANS; > DISCARD TEMP; > > AFAIK, none of the above affect sequences. I think his point is why don't we clear currval() on DISCARD ALL? I can't think of a good reason we don't. He is saying currval() should throw an error after DISCARD ALL: test=> SELECT currval('seq'); ERROR: currval of sequence "seq" is not yet defined in this session I have moved this thead to hackers to get comments. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
On Apr13, 2013, at 17:14 , Ants Aasma wrote: > Based on current analysis, it is particularly good at detecting single > bit errors, as good at detecting burst errors as can be expected from > 16 bits and not horrible at detecting burst writes of zeroes. It is > quite bad at detecting multiple uncorrelated single bit errors and > extremely bad at detecting repeating patterns of errors in low order > bits. I've read the patch and tried to understand why it's that bad at detecting repeating patterns of errors in low order bits, and to see if there might be a way to fix that without too much of a performance impact. Here's what I gather the algorithm does: It treats the input data, a page of L bytes, as a Nx64 matrix V of 16-bit quantities (N = L/64, obviously). It then first computes (using two primes p (PRIME1) and q (PRIME2)) S = V[1,1]*p^63*q^63 + V[1,2]*p^63*q^62 + … + V[1,64]*p^63*q^0 + V[2,1]*p^62*q^63 + V[2,2]*p^62*q^62 + … + V[2,64]*p^62*q^0 + … + V[N,1]*p^0 *q^63 + V[N,2]*p^0 *q^62 + … + V[N,64]*p^0 *q^0 (mod 2^16) = sum V[i,j]*p^(64-i)*q^(64-j) Note that it does that by first computing the row-wise sums without the q^i coefficient, and then (in what the code calls the aggregation phase) combines those row-wise sums into a total, adding the q^i- coefficients along the way. The final hash value is then H = S * p + B * q mod 2^16 where B is a salt value intended to detect swapped pages (currently B is simply the page index) This raises two question. First, why are there two primes? You could just as well using a single prime q and set p=q^64 mod 2^16. You then get S = sum V[i,j] * q^(64*(64-i) + (64-j) = sum V[i,j] * q^(4096 - 64*(i-1) - j) You get higher prime powers that way, but you can easily choose a prime that yields distinct values mod 2^16 for exponents up to 16383. Your PRIME2, for example, does. (It wraps around for 16384, i.e. PRIME2^16384 = 1 mod 2^16, but that's true for every possible prime since 16384 is the Carmichael function's value at 2^16) Second, why does it use addition instead of XOR? It seems that FNV usually XORs the terms together instead of adding them? Regarding the bad behaviour for multiple low-bit errors - can you explain why it behaves badly in that case? I currently fail to see why that would be. I *can* see that the lowest bit of the hash depends only on the lowest bit of the input words, but as long as the lowest bits of the input words also affect other bits of the hash, that shouldn't matter. Which I think they do, but I might be missing something... Here, btw, is a page on FNV hashing. It mentions a few rules for picking suitable primes http://www.isthe.com/chongo/tech/comp/fnv best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
On 16 April 2013 20:27, Tom Lane wrote: > Greg Stark writes: >> On Fri, Apr 12, 2013 at 9:42 PM, Simon Riggs wrote: >>> * WAL checksum is not used as the sole basis for end-of-WAL discovery. >>> We reuse the WAL files, so the prev field in each WAL record shows >>> what the previous end of WAL was. Hence if the WAL checksums give a >>> false positive we still have a double check that the data really is >>> wrong. It's unbelievable that you'd get a false positive and then have >>> the prev field match as well, even though it was the genuine >>> end-of-WAL. > >> This is kind of true and kind of not true. If a system loses power >> while writing lots of data to WAL then the blocks at the end of the >> WAL might not be written out in order. Everything since the last log >> sync might be partly written out and partly not written out. That's >> the case where the checksum is critical. The beginning of a record >> could easily be written out including xl_prev and the end of the >> record not written. 1/64,000 power losses would then end up with an >> assertion failure or corrupt database. > > I have a hard time believing that it's a good idea to add checksums to > data pages and at the same time weaken our ability to detect WAL > corruption. So this seems to me to be going in the wrong direction. > What's it buying for us anyway? A few CPU cycles saved during WAL > generation? That's probably swamped by the other costs of writing WAL, > especially if you're using replication. This part of the thread is dead now I said "lets drop this idea" on 13 April. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
Greg Stark writes: > On Fri, Apr 12, 2013 at 9:42 PM, Simon Riggs wrote: >> * WAL checksum is not used as the sole basis for end-of-WAL discovery. >> We reuse the WAL files, so the prev field in each WAL record shows >> what the previous end of WAL was. Hence if the WAL checksums give a >> false positive we still have a double check that the data really is >> wrong. It's unbelievable that you'd get a false positive and then have >> the prev field match as well, even though it was the genuine >> end-of-WAL. > This is kind of true and kind of not true. If a system loses power > while writing lots of data to WAL then the blocks at the end of the > WAL might not be written out in order. Everything since the last log > sync might be partly written out and partly not written out. That's > the case where the checksum is critical. The beginning of a record > could easily be written out including xl_prev and the end of the > record not written. 1/64,000 power losses would then end up with an > assertion failure or corrupt database. I have a hard time believing that it's a good idea to add checksums to data pages and at the same time weaken our ability to detect WAL corruption. So this seems to me to be going in the wrong direction. What's it buying for us anyway? A few CPU cycles saved during WAL generation? That's probably swamped by the other costs of writing WAL, especially if you're using replication. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] event trigger API documentation?
On 4/16/13 7:49 AM, Robert Haas wrote: > On Mon, Apr 15, 2013 at 11:57 PM, Peter Eisentraut wrote: >> I'm having trouble finding documentation about how to write event >> triggers. The chapter in the documentation >> >> http://www.postgresql.org/docs/devel/static/event-triggers.html >> >> says they can be written in C or supported PLs, but does not explain it >> any further. Is there any documentation for it? > > That chapter has two subsections that may be useful, and there's a bit > more here: > > http://www.postgresql.org/docs/devel/static/plpgsql-trigger.html#PLPGSQL-EVENT-TRIGGER > > Now that you mention it, though, it looks a little sparse. I'm specifically looking for C API documentation, along the lines of http://www.postgresql.org/docs/devel/static/trigger-interface.html. The current chapter on event triggers might as well be ripped out and folded into the CREATE EVENT TRIGGER reference page, because it explains nothing about programming those triggers. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
On Fri, Apr 12, 2013 at 9:42 PM, Simon Riggs wrote: > * WAL checksum is not used as the sole basis for end-of-WAL discovery. > We reuse the WAL files, so the prev field in each WAL record shows > what the previous end of WAL was. Hence if the WAL checksums give a > false positive we still have a double check that the data really is > wrong. It's unbelievable that you'd get a false positive and then have > the prev field match as well, even though it was the genuine > end-of-WAL. This is kind of true and kind of not true. If a system loses power while writing lots of data to WAL then the blocks at the end of the WAL might not be written out in order. Everything since the last log sync might be partly written out and partly not written out. That's the case where the checksum is critical. The beginning of a record could easily be written out including xl_prev and the end of the record not written. 1/64,000 power losses would then end up with an assertion failure or corrupt database. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COPY and Volatile default expressions
On 16 April 2013 15:07, Tom Lane wrote: > Bruce Momjian writes: >> I found Simon's nextval()/COPY timings without this patch sobering. I >> assume he can apply this for 9.3, right? I believe it is a fix for a >> new 9.3 feature. > > It is not a "fix", it is not for a 9.3 feature (the multi-insert thing > went in in 9.2), and personally I'd vote against applying it now, > especially if Simon is expecting anybody to review it. I wrote this expecting it to be a 9.4 feature. I'm good with that. There are other somewhat related optimisations also. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COPY and Volatile default expressions
On Tue, Apr 16, 2013 at 10:07:07AM -0400, Tom Lane wrote: > Bruce Momjian writes: > > I found Simon's nextval()/COPY timings without this patch sobering. I > > assume he can apply this for 9.3, right? I believe it is a fix for a > > new 9.3 feature. > > It is not a "fix", it is not for a 9.3 feature (the multi-insert thing > went in in 9.2), and personally I'd vote against applying it now, > especially if Simon is expecting anybody to review it. Oh, I thought multi-insert was 9.3. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COPY and Volatile default expressions
Bruce Momjian writes: > I found Simon's nextval()/COPY timings without this patch sobering. I > assume he can apply this for 9.3, right? I believe it is a fix for a > new 9.3 feature. It is not a "fix", it is not for a 9.3 feature (the multi-insert thing went in in 9.2), and personally I'd vote against applying it now, especially if Simon is expecting anybody to review it. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
On 9 April 2013 03:35, Ants Aasma wrote: > On Fri, Apr 5, 2013 at 9:39 PM, Ants Aasma wrote: >> Unless somebody tells me not to waste my time I'll go ahead and come >> up with a workable patch by Monday. > > And here you go. I decided to be verbose with the comments as it's > easier to delete a comment to write one. I also left in a huge jumble > of macros to calculate the contents of a helper var during compile > time. This can easily be replaced with the calculated values once we > settle on specific parameters. > > Currently only x86-64 is implemented. 32bit x86 would be mostly a > copy-and-paste job, replacing 64bit pointer registers with 32bit ones. > For other platforms the simplest way would be to use a vectorizing > compiler on the generic variant. -funroll-loops -ftree-vectorize is > enough on gcc. > > Quick bench results on the worst case workload: > master no checksums: tps = 15.561848 > master with checksums: tps = 1.695450 > simd checksums: tps = 14.602698 Numbers look very good on this. Well done. I support the direction of this, but I don't think I'm sufficiently well qualified to verify that the code does what it should and/or fix it if it breaks. If others want to see this happen you'll need to pitch in. My only review comments are to ask for some explanation of the magic numbers... #define CSUM_PRIME1 0x49 #define CSUM_PRIME2 0x986b #define CSUM_TRUNC 65521 Where magic means a level of technology far above my own understanding, and yet no (or not enough) code comments to assist me. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COPY and Volatile default expressions
On Tue, Apr 16, 2013 at 02:37:33PM +0100, Simon Riggs wrote: > On 16 April 2013 13:57, Heikki Linnakangas wrote: > > > You still need to check the args, if the function is nextval, otherwise you > > incorrectly perform the optimization for something like > > "nextval(myvolatilefunc())". > > Guess so. At least its an easy change. I found Simon's nextval()/COPY timings without this patch sobering. I assume he can apply this for 9.3, right? I believe it is a fix for a new 9.3 feature. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COPY and Volatile default expressions
On 16 April 2013 13:57, Heikki Linnakangas wrote: > You still need to check the args, if the function is nextval, otherwise you > incorrectly perform the optimization for something like > "nextval(myvolatilefunc())". Guess so. At least its an easy change. Thanks for checking. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services copy_tuning_with_default_nextval.v2.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COPY and Volatile default expressions
On 16.04.2013 14:38, Simon Riggs wrote: On 15 April 2013 21:53, Simon Riggs wrote: So I'll treat this as two separate cases: * add special case for sequences Patch attached. + if (IsA(node, FuncExpr)) + { + FuncExpr *expr = (FuncExpr *) node; + + /* +* For this case only, we want to ignore the volatility of the +* nextval() function, since some callers want this. nextval() +* has no other args other than sequence name, so we can just +* return false immediately in this case. +*/ + if (expr->funcid == F_NEXTVAL_OID) + return false; + else if (func_volatile(expr->funcid) == PROVOLATILE_VOLATILE) + return true; + /* else fall through to check args */ + } ... You still need to check the args, if the function is nextval, otherwise you incorrectly perform the optimization for something like "nextval(myvolatilefunc())". - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] event trigger API documentation?
On Mon, Apr 15, 2013 at 11:57 PM, Peter Eisentraut wrote: > I'm having trouble finding documentation about how to write event > triggers. The chapter in the documentation > > http://www.postgresql.org/docs/devel/static/event-triggers.html > > says they can be written in C or supported PLs, but does not explain it > any further. Is there any documentation for it? That chapter has two subsections that may be useful, and there's a bit more here: http://www.postgresql.org/docs/devel/static/plpgsql-trigger.html#PLPGSQL-EVENT-TRIGGER Now that you mention it, though, it looks a little sparse. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COPY and Volatile default expressions
On 15 April 2013 21:53, Simon Riggs wrote: > So I'll treat this as two separate cases: > * add special case for sequences Patch attached. > * use the NO SQL mechanism, as described, which implies no reads or > writes of database state. We could test that, but its somewhat harder > and we'd need to test for that on entry to any function, which I don't > much like. For another time. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services copy_tuning_with_default_nextval.v1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [PATCH] Add \ns command to psql
Hi, Here's a new version of a small patch to psql I'm using locally. It adds a command \ns to psql which is a shortcut to set the SEARCH_PATH variable. I'd like to make a case for including this patch as it makes use of schemas/namespaces much easier. There was resistance to including this before just because some developers don't use schemas very much. But we use a lot of them. And I'm sure we're not alone. Previously I used just \n but there was some resistance to this because the single letter commands are becoming scarce. I've also added tab completion making this command much more useful. I don't think tab completition would be possible if this command was defined as a variable (which was another suggestion offered at the time). Cheers, Colin command.c.diff Description: Binary data tab-complete.c.diff Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] "stored procedures"
Hello 2013/4/16 aasat > Is "stored procedures" planned in future? I think is a "most missing" > future > today in Postgres. > It is in ToDo, but nobody working on this feature in this moment, probably. > > Using a dblink to emulate commit in transaction is very complicated > > probably autonomous transaction will be implemented first - and should be really nice feature. Regards Pavel > > > > -- > View this message in context: > http://postgresql.1045698.n5.nabble.com/stored-procedures-tp4331060p5752274.html > Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers >
Re: [HACKERS] "stored procedures"
Is "stored procedures" planned in future? I think is a "most missing" future today in Postgres. Using a dblink to emulate commit in transaction is very complicated -- View this message in context: http://postgresql.1045698.n5.nabble.com/stored-procedures-tp4331060p5752274.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers