Re: [HACKERS] event trigger API documentation?
Peter Eisentraut pete...@gmx.net writes: I'm specifically looking for C API documentation, along the lines of http://www.postgresql.org/docs/devel/static/trigger-interface.html. The current chapter on event triggers might as well be ripped out and folded into the CREATE EVENT TRIGGER reference page, because it explains nothing about programming those triggers. I'm not sure about ripping it out, it does not sound like a good idea to me. It needs some addition and C level examples yes. The plan was to build a contrib module as an example, that would cancel any (supported) command you try to run by means of ereport(ERROR, …);. Then add that in pieces in the docs with details about what's going on. While the commit fest was still running didn't look like the right time to work on that. Beta looks like when to be working on that. What do you think about the proposal here? Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Inconsistent DB data in Streaming Replication
On Monday, April 15, 2013 1:02 PM Florian Pflug wrote: On Apr14, 2013, at 17:56 , Fujii Masao masao.fu...@gmail.com wrote: At fast shutdown, after walsender sends the checkpoint record and closes the replication connection, walreceiver can detect the close of connection before receiving all WAL records. This means that, even if walsender sends all WAL records, walreceiver cannot always receive all of them. That sounds like a bug in walreceiver to me. The following code in walreceiver's main loop looks suspicious: /* * Process the received data, and any subsequent data we * can read without blocking. */ for (;;) { if (len 0) { /* Something was received from master, so reset timeout */ ... XLogWalRcvProcessMsg(buf[0], buf[1], len - 1); } else if (len == 0) break; else if (len 0) { ereport(LOG, (errmsg(replication terminated by primary server), errdetail(End of WAL reached on timeline %u at %X/%X, startpointTLI, (uint32) (LogstreamResult.Write 32), (uint32) LogstreamResult.Write))); ... } len = walrcv_receive(0, buf); } /* Let the master know that we received some data. */ XLogWalRcvSendReply(false, false); /* * If we've written some records, flush them to disk and * let the startup process and primary server know about * them. */ XLogWalRcvFlush(false); The loop at the top looks fine - it specifically avoids throwing an error on EOF. But the code then proceeds to XLogWalRcvSendReply() which doesn't seem to have the same smarts - it simply does if (PQputCopyData(streamConn, buffer, nbytes) = 0 || PQflush(streamConn)) ereport(ERROR, (errmsg(could not send data to WAL stream: %s, PQerrorMessage(streamConn; Unless I'm missing something, that certainly seems to explain how a standby can lag behind even after a controlled shutdown of the master. Do you mean to say that as an error has occurred, so it would not be able to flush received WAL, which could result in loss of WAL? I think even if error occurs, it will call flush in WalRcvDie(), before terminating WALReceiver. With Regards, Amit Kapila. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Inconsistent DB data in Streaming Replication
On Apr17, 2013, at 12:22 , Amit Kapila amit.kap...@huawei.com wrote: Do you mean to say that as an error has occurred, so it would not be able to flush received WAL, which could result in loss of WAL? I think even if error occurs, it will call flush in WalRcvDie(), before terminating WALReceiver. Hm, true, but for that to prevent the problem the inner processing loop needs to always read up to EOF before it exits and we attempt to send a reply. Which I don't think it necessarily does. Assume, that the master sends a chunk of data, waits a bit, and finally sends the shutdown record and exits. The slave might then receive the first chunk, and it might trigger sending a reply. At the time the reply is sent, the master has already sent the shutdown record and closed the connection, and we'll thus fail to reply and abort. Since the shutdown record has never been read from the socket, XLogWalRcvFlush won't flush it, and the slave ends up behind the master. Also, since XLogWalRcvProcessMsg responds to keep-alives messages, we might also error out of the inner processing loop if the server closes the socket after sending a keepalive but before we attempt to respond. Fixing this on the receive side alone seems quite messy and fragile. So instead, I think we should let the master send a shutdown message after it has sent everything it wants to send, and wait for the client to acknowledge it before shutting down the socket. If the client fails to respond, we could log a fat WARNING. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO links broken?
On Wed, Apr 17, 2013 at 12:21 AM, Stephen Scheck singularsyn...@gmail.com wrote: Many of the links in the TODO wiki page result in a page not found error. Is this page up-to-date? Can anything be inferred about the status of these items from the broken link? I think what we can infer is that the new archives code is broken. I hope someone is planning to fix that. If there's been some decision made that we don't have to support the historical URLs for our archives pages, I think that's a really bad plan; those links are in a lot more places than just the Todo. As for your actual question, the TODO list is an accumulation of items that someone, sometime over the last ten years, thought would be valuable to work on, and nobody objected too strenuously to the idea. The fact that it's in the TODO list doesn't mean that anyone is working on it now, that anyone will ever work on it, or that people would still think it was a good idea if it were re-proposed today. It's just kind of a list to jump start people's thinking, and shouldn't be taken as particularly official. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO links broken?
On Wed, Apr 17, 2013 at 2:13 PM, Robert Haas robertmh...@gmail.com wrote: On Wed, Apr 17, 2013 at 12:21 AM, Stephen Scheck singularsyn...@gmail.com wrote: Many of the links in the TODO wiki page result in a page not found error. Is this page up-to-date? Can anything be inferred about the status of these items from the broken link? I think what we can infer is that the new archives code is broken. I hope someone is planning to fix that. If there's been some decision Yes. We can infer that. It makes it a whole lot easier to fix something with better bug repors than that, of course, as I'm sure you (Robert in this case, not Stephen) are generally aware of. I've reverted a patch that was applied a few days ago that dealt with how URLs are parsed, and I think that's the one that's responsible. But it would be good to have an actual example of what didn't work, because the links i tried all worked... made that we don't have to support the historical URLs for our archives pages, I think that's a really bad plan; those links are in a lot more places than just the Todo. No, the plan has always been to support those. There are no plans to remove that. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Inconsistent DB data in Streaming Replication
On Wednesday, April 17, 2013 4:19 PM Florian Pflug wrote: On Apr17, 2013, at 12:22 , Amit Kapila amit.kap...@huawei.com wrote: Do you mean to say that as an error has occurred, so it would not be able to flush received WAL, which could result in loss of WAL? I think even if error occurs, it will call flush in WalRcvDie(), before terminating WALReceiver. Hm, true, but for that to prevent the problem the inner processing loop needs to always read up to EOF before it exits and we attempt to send a reply. Which I don't think it necessarily does. Assume, that the master sends a chunk of data, waits a bit, and finally sends the shutdown record and exits. The slave might then receive the first chunk, and it might trigger sending a reply. At the time the reply is sent, the master has already sent the shutdown record and closed the connection, and we'll thus fail to reply and abort. Since the shutdown record has never been read from the socket, XLogWalRcvFlush won't flush it, and the slave ends up behind the master. Also, since XLogWalRcvProcessMsg responds to keep-alives messages, we might also error out of the inner processing loop if the server closes the socket after sending a keepalive but before we attempt to respond. Fixing this on the receive side alone seems quite messy and fragile. So instead, I think we should let the master send a shutdown message after it has sent everything it wants to send, and wait for the client to acknowledge it before shutting down the socket. If the client fails to respond, we could log a fat WARNING. Your explanation seems to be okay, but I think before discussing the exact solution, If the actual problem can be reproduced, then it might be better to discuss this solution. With Regards, Amit Kapila. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO links broken?
On Wed, Apr 17, 2013 at 8:48 AM, Magnus Hagander mag...@hagander.net wrote: Yes. We can infer that. It makes it a whole lot easier to fix something with better bug repors than that, of course, as I'm sure you (Robert in this case, not Stephen) are generally aware of. I've reverted a patch that was applied a few days ago that dealt with how URLs are parsed, and I think that's the one that's responsible. But it would be good to have an actual example of what didn't work, because the links i tried all worked... Hmm. Sorry for the lack of detail. I assumed the problem was obvious and widespread because I clicked on the first link I saw in the Todo and it didn't work. But after clicking a bunch more links from the Todo, I only found three that fail. http://archives.postgresql.org/pgsql-hackers/2008-12/msg01340.php http://archives.postgresql.org/pgsql-hackers/2011-03/msg01831.php http://www.postgresql.org/message-id/4B577E9F.8000505%40dunslane.net/ That last one works if I change %40 to @, so that one might be a wiki problem rather than an archives problem. In fact, for all I know the other two might have been broken all along too; I'm just assuming they used to work. Sorry for going overboard, ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO links broken?
On Wed, Apr 17, 2013 at 3:14 PM, Robert Haas robertmh...@gmail.com wrote: On Wed, Apr 17, 2013 at 8:48 AM, Magnus Hagander mag...@hagander.net wrote: Yes. We can infer that. It makes it a whole lot easier to fix something with better bug repors than that, of course, as I'm sure you (Robert in this case, not Stephen) are generally aware of. I've reverted a patch that was applied a few days ago that dealt with how URLs are parsed, and I think that's the one that's responsible. But it would be good to have an actual example of what didn't work, because the links i tried all worked... Hmm. Sorry for the lack of detail. I assumed the problem was obvious and widespread because I clicked on the first link I saw in the Todo and it didn't work. But after clicking a bunch more links from the Todo, I only found three that fail. http://archives.postgresql.org/pgsql-hackers/2008-12/msg01340.php http://archives.postgresql.org/pgsql-hackers/2011-03/msg01831.php Works now, so that seems to have been fixed by the reverting of the patch. It might be a while before they all recover due to caching issues, but both of these work now for me, which seems to indcate the fix is the right one. http://www.postgresql.org/message-id/4B577E9F.8000505%40dunslane.net/ It works with %40 for me now, so it might have been related - can you check if it is still an issue for you? It might be different in different browsers. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO links broken?
On Wed, Apr 17, 2013 at 9:23 AM, Magnus Hagander mag...@hagander.net wrote: Hmm. Sorry for the lack of detail. I assumed the problem was obvious and widespread because I clicked on the first link I saw in the Todo and it didn't work. But after clicking a bunch more links from the Todo, I only found three that fail. http://archives.postgresql.org/pgsql-hackers/2008-12/msg01340.php http://archives.postgresql.org/pgsql-hackers/2011-03/msg01831.php Works now, so that seems to have been fixed by the reverting of the patch. It might be a while before they all recover due to caching issues, but both of these work now for me, which seems to indcate the fix is the right one. http://www.postgresql.org/message-id/4B577E9F.8000505%40dunslane.net/ It works with %40 for me now, so it might have been related - can you check if it is still an issue for you? It might be different in different browsers. Yeah, it seems OK now. Thanks for the quick response. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sepgsql 2/3] Add db_schema:search permission checks
On Fri, Apr 12, 2013 at 2:44 PM, Kohei KaiGai kai...@kaigai.gr.jp wrote: Yes, of course. The attached one replaces the getObjectDescription in sepgsql/proc.c, and relative changes in regression test. Thanks. Committed. I also committed the first two hunks of your cleanup patch but omitted the third one, which is not well-worded and seems like overkill anyway. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] event trigger API documentation?
Dimitri Fontaine dimi...@2ndquadrant.fr writes: I'm not sure about ripping it out, it does not sound like a good idea to me. It needs some addition and C level examples yes. The plan was to build a contrib module as an example, that would cancel any (supported) command you try to run by means of ereport(ERROR, â¦);. Then add that in pieces in the docs with details about what's going on. While the commit fest was still running didn't look like the right time to work on that. Beta looks like when to be working on that. What do you think about the proposal here? We're not adding new contrib modules during beta. Expanding the documentation seems like a fine beta-period activity, though. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] erroneous restore into pg_catalog schema
On Tue, Jan 29, 2013 at 6:00 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Tue, Jan 29, 2013 at 2:30 PM, Alvaro Herrera Robert, are you working on this? I wasn't, but I can, if we agree on it. I think we need to do *something* (and accordingly have added this to the 9.3 open items page so we don't forget about it). Whether Robert's idea is the best one probably depends in part on how clean the patch turns out to be. The attached patch attempts to implement this. I discovered that, in fact, we have a number of places in our initdb-time scripts that rely on the current behavior, but they weren't hard to fix; and in fact I think the extra verbosity is probably not a bad thing here. See what you think. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company explicit-pg-catalog.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
On Wed, Apr 17, 2013 at 2:26 AM, Florian Pflug f...@phlo.org wrote: This raises two question. First, why are there two primes? You could just as well using a single prime q and set p=q^64 mod 2^16. You then get S = sum V[i,j] * q^(64*(64-i) + (64-j) = sum V[i,j] * q^(4096 - 64*(i-1) - j) You get higher prime powers that way, but you can easily choose a prime that yields distinct values mod 2^16 for exponents up to 16383. Your PRIME2, for example, does. (It wraps around for 16384, i.e. PRIME2^16384 = 1 mod 2^16, but that's true for every possible prime since 16384 is the Carmichael function's value at 2^16) The experimental detection rate is about the same if we use a single prime. But I think you have the analytical form wrong here. It should be given q = p: S = sum V[i,j] * p^(64-i) * p^(64-j) = sum V[i,j] * p^(64 - i + 64 - j) = sum V[i,j] * p^(128 - i -j) Yeah, if you set q = p that's true. My suggestion was p=q^64 though... So it was, I guess it was too late here and I missed it... All thing considered that is a good suggestion, if for nothing else, the generic implementation can be smaller this way. Second, why does it use addition instead of XOR? It seems that FNV usually XORs the terms together instead of adding them? Testing showed slightly better detection rate for adds. Intuitively I think it's because the carry introduces some additional mixing. Hm, but OTOH it makes S linear in V, i.e. if you have two inputs V1,V2 and V = V1 + V2, then S = S1 + S2. Also, if V' = V*m, then S' = S*m. The second property is quite undesirable, I think. Assume all the V[i,j] are divisible by 2^k, i.e. have zeros at all bit positions 0..(k-1). Then, due to linearity, S is also divisible by 2^k, i.e. also has no ones before the k-th bit. This means, for example that if you hash values values which all have their lowest bit cleared, you get only 2^15 distinct hash values. If they all have the two lowest bits cleared, you get only 2^14 distinct values, and so on… Generally, linearity doesn't seem to be a property that one wants in a hash I think, so my suggestion is to stick to XOR. This made me remember, the issue I had was with high order bits, not with low order ones, somehow I got them confused. The exact issue is that the high order bits don't affect any bit lower than them. It's easy to see that if you remember the shift and add nature of multiply. Unfortunately XOR will not fix that. Neither will adding an offset basis. This is the fundamental thing that is behind the not-so-great uncorrelated bit error detection rate. While I understand that linearity is not a desirable property, I couldn't think of a realistic case where it would hurt. I can see how it can hurt checksums of variable length values, but for our fixed buffer case it's definitely not so clear cut. On the pro side the distributive property that is behind linearity allowed me to do final aggregation in a tree form, performing the multiplies in parallel instead of linearly. This adds up to the difference between 250 cycles (64*(3 cycle IMUL + 1 cycle XOR)) and 25 cycles (4*5 cycle pmullw + 5 cycle addw). Given that the main loop is about 576 cycles, this is a significant difference. Here, btw, is a page on FNV hashing. It mentions a few rules for picking suitable primes http://www.isthe.com/chongo/tech/comp/fnv Unfortunately the rules don't apply here because of the hash size. Yeah :-(. I noticed that their 32-bit prime only has a single one outside the first 16 bits. Maybe we can take advantage of that and use a 32-bit state while still providing decent performance on machines without a 32-bit x 32-bit - 32-bit multiply instruction? Looking at the Power instruction set, a 32bit mul by the FNV prime would look like this: vmulouh tmp1, hash, prime vmladduhm tmp1, hash, prime16 vslw tmp2, hash, 24 vadduwm hash, tmp1, tmp2 That is 4 instructions to multiply 4 values. Depending on the specific execution ports on the processor it might faster or slower than the scalar version but not by a whole lot. Main benefit would be that the intermediate state could be held in registers. If we lived in an Intel-only world, I'd suggest going with a 32-bit state, since SSE4.1 support is *very* wide-spread already - the last CPUs without it came out over 5 years ago, I think. (Core2 and later support SSE4.1, and some later Core1 do too) But unfortunately things look bleak even for other x86 implementations - AMD support SSE4.1 only starting with Bulldozer, which came out 2011 or so I believe. Leaving the x86 realm, it seems that only ARM's NEON provides the instructions we'd need - AltiVec seems to be support only 16-bit multiplies, and from what some quick googling brought up, MIPS and SPARC SIMD instructions look no better.. OTOH, chances are that nobody will ever do SIMD implementations for those machines. In that case, working in 32-bit chunks instead of
Re: [HACKERS] Enabling Checksums
On Wed, Apr 17, 2013 at 05:47:55PM +0300, Ants Aasma wrote: The SSE4.1 implementation of this would be as fast as the last pat, generic version will be faster and we avoid the linearity issue. By using different offsets for each of the partial hashes we don't directly suffer from commutativity of the final xor folding. By using the xor-then-multiply variant the last values hashed have their bits mixed before folding together. Speaking against this option is the fact that we will need to do CPU detection at startup to make it fast on the x86 that support SSE4.1, and the fact that AMD CPUs before 2011 will run it an order of magnitude slower (but still faster than the best CRC). Any opinions if it would be a reasonable tradeoff to have a better checksum with great performance on latest x86 CPUs and good performance on other architectures at the expense of having only ok performance on older AMD CPUs? Also, any good suggestions where should we do CPU detection when we go this route? As much as I love the idea of improving the algorithm, it is disturbing we are discussing this so close to beta, with an algorithm that is under analysis, with no (runtime) CPU detection, and in something that is going to be embedded into our data page format. I can't even think of another case where we do run-time CPU detection. I am wondering if we need to tell users that pg_upgrade will not be possible if you enable page-level checksums, so we are not trapped with something we want to improve in 9.4. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
On Apr17, 2013, at 17:09 , Bruce Momjian br...@momjian.us wrote: As much as I love the idea of improving the algorithm, it is disturbing we are discussing this so close to beta, with an algorithm that is under analysis, with no (runtime) CPU detection, and in something that is going to be embedded into our data page format. I can't even think of another case where we do run-time CPU detection. We could still ship the new checksum algorithm with 9.3, but omit the SSE-optimized version, i.e. include only the plain C implementation. I think Ants mentioned somehwere that gcc does a pretty good job of vectorizing that, so people who really care (and who use GCC) could compile with -msse41 --unrool-loops --tree-vectorize, and get performance close to that of a hand-coded SSE version. The important decision we're facing is which algorithm to use. I personally believe Ants is on the right track there - FNV or a variant thereof looks like a good choice to me, but the details have yet to be nailed I think. However, you're right that time's running out. It'd be a shame though if we'd lock ourselves into CRC as the only available algorithm essentially forever. Is there any way we can change the checksum algorithm in 9.4 *without* breaking pg_upgrade? Maybe pd_pagesize_version could be used for that - we could make version 5 mean just like version 4, but with a different checksum algorithm. Since the layout wouldn't actually chance, that'd be far easier to pull off than actually supporting multiple page layouts. If that works, then shipping 9.3 with CRC is probably the best solution. If not, we should see to it that something like Ants parallel version of FNV or a smallget into 9.3 if at all possible, IMHO. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] event trigger API documentation?
On 4/17/13 5:41 AM, Dimitri Fontaine wrote: I'm not sure about ripping it out, it does not sound like a good idea to me. It needs some addition and C level examples yes. The plan was to build a contrib module as an example, that would cancel any (supported) command you try to run by means of ereport(ERROR, …);. Then add that in pieces in the docs with details about what's going on. While the commit fest was still running didn't look like the right time to work on that. Beta looks like when to be working on that. Well, if documentation had been available well before beta, other procedural languages might have gained support for event triggers. If it's not being documented, it might not happen very soon. It would have been good to have at least one untrusted language with event trigger support, so that you can hook in external auditing or logging systems. With the existing PL/pgSQL support, the possible actions are a bit limited. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
On Apr17, 2013, at 16:47 , Ants Aasma a...@cybertec.at wrote: This made me remember, the issue I had was with high order bits, not with low order ones, somehow I got them confused. The exact issue is that the high order bits don't affect any bit lower than them. It's easy to see that if you remember the shift and add nature of multiply. Unfortunately XOR will not fix that. Neither will adding an offset basis. This is the fundamental thing that is behind the not-so-great uncorrelated bit error detection rate. Right. We could maybe fix that by extending the update step to tmp = s[j] ^ d[i,j] s[j] = (t * PRIME) ^ (t 1) or something like that. Shifting t instead of (t * PRIME) should help to reduce the performance impact, since a reordering CPU should be able to parallelize the multiple and the shift. Note though that I haven't really though that through extensively - the general idea should be sound, but whether 1 is a good shifting amount I do not know. While I understand that linearity is not a desirable property, I couldn't think of a realistic case where it would hurt. I can see how it can hurt checksums of variable length values, but for our fixed buffer case it's definitely not so clear cut. On the pro side the distributive property that is behind linearity allowed me to do final aggregation in a tree form, performing the multiplies in parallel instead of linearly. This adds up to the difference between 250 cycles (64*(3 cycle IMUL + 1 cycle XOR)) and 25 cycles (4*5 cycle pmullw + 5 cycle addw). Given that the main loop is about 576 cycles, this is a significant difference. I wonder if we use 32bit FNV-1a's (the h = (h^v)*p variant) with different offset-basis values, would it be enough to just XOR fold the resulting values together. The algorithm looking like this: Hm, this will make the algorithm less resilient to some particular input permutations (e.g. those which swap the 64*i-th and the (64+1)-ith words), but those seem very unlikely to occur randomly. But if we're worried about that, we could use your linear combination method for the aggregation phase. Speaking against this option is the fact that we will need to do CPU detection at startup to make it fast on the x86 that support SSE4.1, and the fact that AMD CPUs before 2011 will run it an order of magnitude slower (but still faster than the best CRC). Hm, CPU detection isn't that hard, and given the speed at which Intel currently invents new instructions we'll end up going that route sooner or later anyway, I think. Any opinions if it would be a reasonable tradeoff to have a better checksum with great performance on latest x86 CPUs and good performance on other architectures at the expense of having only ok performance on older AMD CPUs? The loss on AMD is offset by the increased performance on machines where we can't vectorize, I'd say. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
On Wed, Apr 17, 2013 at 4:28 PM, Florian Pflug f...@phlo.org wrote: Is there any way we can change the checksum algorithm in 9.4 *without* breaking pg_upgrade? Personally I think we're going to need a solution for page format changes someday eventually What advantages are we postponing now to avoid it? * 32-bit checksums? * Being able to enable/disable checksums? Anything else? -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Changing schema on the fly
Hello dear -hackers, I'm maintaining pg_reorg/pg_repack, which you may know effectively allow online VACUUM FULL or CLUSTER. It works by installing logging triggers to keep data up-to-date during the migration, creating a copy of the table, and eventually swapping the tables relfilenodes. The new table is forced to keep exactly the same physical structure, e.g. restoring dropped columns too. Failing to do so was apparently a big mistake, looking at this commit [1]. My knowledge of Postgres at that level is limited: what I imagine is that cached plans keep the offset of the field in the row, so data ends up read/written in the wrong position if such offset changes. The commit message mentions views and stored procedures being affected. Is there a way to force invalidation of all the cache that may hold a reference to the columns offset? Or is the problem an entirely different one and the above cache invalidation wouldn't be enough? If we managed to allow schema change in pg_repack we could allow many more online manipulations features: changing data types, reordering columns, really dropping columns freeing up space etc. Thank you very much, -- Daniele [1] https://github.com/reorg/pg_repack/commit/960930b645df8eeeda15f176c95d3e450786f78a -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
On Wed, Apr 17, 2013 at 05:28:06PM +0200, Florian Pflug wrote: However, you're right that time's running out. It'd be a shame though if we'd lock ourselves into CRC as the only available algorithm essentially forever. Is there any way we can change the checksum algorithm in 9.4 *without* breaking pg_upgrade? Maybe pd_pagesize_version could be used for that - we could make version 5 mean just like version 4, but with a different checksum algorithm. Since the layout wouldn't actually chance, that'd be far easier to pull off than actually supporting multiple page layouts. If that works, then shipping 9.3 with CRC is probably the best solution. If not, we should see to it that something like Ants parallel version of FNV or a smallget into 9.3 if at all possible, IMHO. I was going to ask about the flexibility of pg_upgrade and checksums. Right now you have to match the old and new cluster checksum modes, but it seems it would be possible to allow pg_upgrade use from checksum to no-checksum servers. Does the backend look at the pg_controldata setting, or at the page checksum flag? If the former, it seems pg_upgrade could run a a no-checksum server just fine that had checksum information on its pages. This might give us more flexibility in changing the checksum algorithm in the future, i.e. you only lose checksum ability. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
On Apr17, 2013, at 18:15 , Bruce Momjian br...@momjian.us wrote: On Wed, Apr 17, 2013 at 05:28:06PM +0200, Florian Pflug wrote: However, you're right that time's running out. It'd be a shame though if we'd lock ourselves into CRC as the only available algorithm essentially forever. Is there any way we can change the checksum algorithm in 9.4 *without* breaking pg_upgrade? Maybe pd_pagesize_version could be used for that - we could make version 5 mean just like version 4, but with a different checksum algorithm. Since the layout wouldn't actually chance, that'd be far easier to pull off than actually supporting multiple page layouts. If that works, then shipping 9.3 with CRC is probably the best solution. If not, we should see to it that something like Ants parallel version of FNV or a smallget into 9.3 if at all possible, IMHO. I was going to ask about the flexibility of pg_upgrade and checksums. Right now you have to match the old and new cluster checksum modes, but it seems it would be possible to allow pg_upgrade use from checksum to no-checksum servers. Does the backend look at the pg_controldata setting, or at the page checksum flag? If the former, it seems pg_upgrade could run a a no-checksum server just fine that had checksum information on its pages. This might give us more flexibility in changing the checksum algorithm in the future, i.e. you only lose checksum ability. AFAIK, there's currently no per-page checksum flag. Still, being only able to go from checksummed to not-checksummed probably is for all practical purposes the same as not being able to pg_upgrade at all. Otherwise, why would people have enabled checksums in the first place? best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
On Wed, Apr 17, 2013 at 06:33:58PM +0200, Florian Pflug wrote: I was going to ask about the flexibility of pg_upgrade and checksums. Right now you have to match the old and new cluster checksum modes, but it seems it would be possible to allow pg_upgrade use from checksum to no-checksum servers. Does the backend look at the pg_controldata setting, or at the page checksum flag? If the former, it seems pg_upgrade could run a a no-checksum server just fine that had checksum information on its pages. This might give us more flexibility in changing the checksum algorithm in the future, i.e. you only lose checksum ability. AFAIK, there's currently no per-page checksum flag. Still, being only able to go from checksummed to not-checksummed probably is for all practical purposes the same as not being able to pg_upgrade at all. Otherwise, why would people have enabled checksums in the first place? Good point, but it is _an_ option, at least. I would like to know the answer of how an upgrade from checksum to no-checksum would behave so I can modify pg_upgrade to allow it. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
Greg Stark st...@mit.edu writes: On Wed, Apr 17, 2013 at 4:28 PM, Florian Pflug f...@phlo.org wrote: Is there any way we can change the checksum algorithm in 9.4 *without* breaking pg_upgrade? Personally I think we're going to need a solution for page format changes someday eventually What advantages are we postponing now to avoid it? Um, other than the ability to make a release? We aren't going to hold up 9.3 until that particular bit of pie in the sky lands. Indeed I don't expect to see it available in the next couple years either. When we were looking at that seriously, two or three years ago, arbitrary page format changes looked *hard*. The idea of bumping the page format version number to signal a checksum algorithm change might work though. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
On Wed, Apr 17, 2013 at 01:22:01PM -0400, Tom Lane wrote: Greg Stark st...@mit.edu writes: On Wed, Apr 17, 2013 at 4:28 PM, Florian Pflug f...@phlo.org wrote: Is there any way we can change the checksum algorithm in 9.4 *without* breaking pg_upgrade? Personally I think we're going to need a solution for page format changes someday eventually What advantages are we postponing now to avoid it? Um, other than the ability to make a release? We aren't going to hold up 9.3 until that particular bit of pie in the sky lands. Indeed I don't expect to see it available in the next couple years either. When we were looking at that seriously, two or three years ago, arbitrary page format changes looked *hard*. The idea of bumping the page format version number to signal a checksum algorithm change might work though. Uh, not sure how pg_upgrade would detect that as the version number is not stored in pg_controldata, e.g.: Data page checksums: enabled/disabled Do we need to address this for 9.3? (Yuck) -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
Bruce Momjian br...@momjian.us writes: Uh, not sure how pg_upgrade would detect that as the version number is not stored in pg_controldata, e.g.: Data page checksums: enabled/disabled That seems pretty shortsighted. The field probably ought to be defined as containing a checksum algorithm ID number, not a boolean. But having said that, I'm not sure why this would be pg_upgrade's problem. By definition, we do not want pg_upgrade running around looking at individual data pages. Therefore, whatever we might do about checksum algorithm changes would have to be something that can be managed on-the-fly by the newer server. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] erroneous restore into pg_catalog schema
Robert Haas robertmh...@gmail.com writes: On Tue, Jan 29, 2013 at 6:00 PM, Tom Lane t...@sss.pgh.pa.us wrote: I think we need to do *something* (and accordingly have added this to the 9.3 open items page so we don't forget about it). Whether Robert's idea is the best one probably depends in part on how clean the patch turns out to be. The attached patch attempts to implement this. I discovered that, in fact, we have a number of places in our initdb-time scripts that rely on the current behavior, but they weren't hard to fix; and in fact I think the extra verbosity is probably not a bad thing here. See what you think. I think this breaks contrib/adminpack, and perhaps other extensions. They'd not be hard to fix with script changes, but they'd be broken. In general, we would now have a situation where relocatable extensions could never be installed into pg_catalog. That might be OK, but at least it would need to be documented. Also, I think we'd be pretty much hard-wiring the decision that pg_dump will never dump objects in pg_catalog, because its method for selecting the creation schema won't work in that case. That probably is all right too, but we need to realize it's a consequence of this. As far as the code goes, OK except I strongly disapprove of removing the comment about temp_missing at line 3512. The coding is not any less a hack in that respect for having been pushed into a subroutine. If you want to rewrite the comment, fine, but failing to point out that something funny is going on is not a service to readers. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
On Wed, Apr 17, 2013 at 01:29:18PM -0400, Tom Lane wrote: Bruce Momjian br...@momjian.us writes: Uh, not sure how pg_upgrade would detect that as the version number is not stored in pg_controldata, e.g.: Data page checksums: enabled/disabled That seems pretty shortsighted. The field probably ought to be defined as containing a checksum algorithm ID number, not a boolean. But having said that, I'm not sure why this would be pg_upgrade's problem. By definition, we do not want pg_upgrade running around looking at individual data pages. Therefore, whatever we might do about checksum algorithm changes would have to be something that can be managed on-the-fly by the newer server. Well, my idea was that pg_upgrade would allow upgrades from old clusters with the same checksum algorithm version, but not non-matching ones. This would allow the checksum algorithm to be changed and force pg_upgrade to fail. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
Bruce Momjian br...@momjian.us writes: On Wed, Apr 17, 2013 at 01:29:18PM -0400, Tom Lane wrote: But having said that, I'm not sure why this would be pg_upgrade's problem. By definition, we do not want pg_upgrade running around looking at individual data pages. Therefore, whatever we might do about checksum algorithm changes would have to be something that can be managed on-the-fly by the newer server. Well, my idea was that pg_upgrade would allow upgrades from old clusters with the same checksum algorithm version, but not non-matching ones. This would allow the checksum algorithm to be changed and force pg_upgrade to fail. It's rather premature to be defining pg_upgrade's behavior for a situation that doesn't exist yet, and may very well never exist in that form. It seems more likely to me that we'd want to allow incremental algorithm changes, in which case pg_upgrade ought not do anything about this case anyway. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] event trigger API documentation?
Peter Eisentraut pete...@gmx.net writes: Well, if documentation had been available well before beta, other procedural languages might have gained support for event triggers. If it's not being documented, it might not happen very soon. It's been a moving target for the last two years, and until very recently what to document was not clear enough to spend any time on actually writing the docs. Please also note that the first series of patches did include the support code for all the core PL, but Robert didn't feel like commiting that and no other commiter did step up. I'm struggling to understand how to properly solve the problem here from an organisation perspective. Before beta was not the good time for the people involved, and was not the good time for other people to get involved. Beta is not the good time to fix what couldn't be done before. When are we supposed to work on the rough edges left when a patch went through 8 commit fests and so many discussions that it's quite hard indeed to step back and understand what's in and what's missing to make it sensible for the release? Maybe the right answer is to remove the documentation about event triggers completely for 9.3 and tell the users about them later when we have something else than just internal infrastructure. Now, if it's ok to add support to others PL, I can cook a patch from the bits I had done last year, the only work should be removing variables. It would have been good to have at least one untrusted language with event trigger support, so that you can hook in external auditing or logging systems. With the existing PL/pgSQL support, the possible actions are a bit limited. Well, you do realise that the only information you get passed down to the event trigger code explicitely are the event name and the command tag, and nothing else, right? If you have a use case that requires any other information, then documenting the event triggers will do nothing to help you implement it, you will need to code in C and go look at the backend sources. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Fix typo in contrib/hstore/crc32.c comment
Hi all, The attached patch fix a little typo on contrib/hstore/crc32.c comment. Regards, -- Fabrízio de Royes Mello Consultoria/Coaching PostgreSQL Blog sobre TI: http://fabriziomello.blogspot.com Perfil Linkedin: http://br.linkedin.com/in/fabriziomello Twitter: http://twitter.com/fabriziomello fix_typo_hstore_crc32_comment.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
On Wed, 2013-04-17 at 12:42 -0400, Bruce Momjian wrote: AFAIK, there's currently no per-page checksum flag. Still, being only able to go from checksummed to not-checksummed probably is for all practical purposes the same as not being able to pg_upgrade at all. Otherwise, why would people have enabled checksums in the first place? Good point, but it is _an_ option, at least. I would like to know the answer of how an upgrade from checksum to no-checksum would behave so I can modify pg_upgrade to allow it. Why? 9.3 pg_upgrade certainly doesn't need it. When we get to 9.4, if someone has checksums enabled and wants to disable it, why is pg_upgrade the right time to do that? Wouldn't it make more sense to allow them to do that at any time? Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
On Wed, 2013-04-17 at 16:58 +0100, Greg Stark wrote: On Wed, Apr 17, 2013 at 4:28 PM, Florian Pflug f...@phlo.org wrote: Is there any way we can change the checksum algorithm in 9.4 *without* breaking pg_upgrade? Personally I think we're going to need a solution for page format changes someday eventually What advantages are we postponing now to avoid it? * 32-bit checksums? * Being able to enable/disable checksums? Anything else? I'm not sure that changing the page format is the most difficult part of enabling/disabling checksums. It's easy enough to have page header bits if the current information is not enough (and those bits were there, but Heikki requested their removal and I couldn't think of a concrete reason to keep them). Eventually, it would be nice to be able to break the page format and have more space for things like checksums (and probably a few other things, maybe some visibility-related optimizations). But that's a few years off and we don't have any real plan for that. What I wanted to accomplish with this patch is the simplest checksum mechanism that we could get that would be fast enough that many people would be able to use it. I expect it to be useful until we do decide to break the page format. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] event trigger API documentation?
On 4/17/13 3:20 PM, Dimitri Fontaine wrote: It would have been good to have at least one untrusted language with event trigger support, so that you can hook in external auditing or logging systems. With the existing PL/pgSQL support, the possible actions are a bit limited. Well, you do realise that the only information you get passed down to the event trigger code explicitely are the event name and the command tag, and nothing else, right? Offhand, that seems about enough, but I'm just beginning to explore. Chances are, event triggers will end up somewhere near the top of the release announcements, so we should have a consistent message about what to do with them and how to use them. If for now, we say, we only support writing them in PL/pgSQL, and here is how to do that, and here are some examples, that's fine. But currently, it's not quite clear. Surely you had some use cases in mind when you set out to implement this. What were they, and where are we now in relation to them? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] currval and DISCARD ALL
On Tue, Apr 16, 2013 at 05:09:19PM -0400, Tom Lane wrote: Bruce Momjian br...@momjian.us writes: I think his point is why don't we clear currval() on DISCARD ALL? I can't think of a good reason we don't. Because we'd have to invent a new suboperation DISCARD SEQUENCES, for one thing, in order to be consistent. I'd rather ask why it's important that we should throw away such state. It doesn't seem to me to be important enough to justify a new subcommand. consistency is a philosophical thing. Practical reason for subcommands is possibility to have partial reset for special situations, pooling or otherwise. But such usage seems rather rare in real life. If the sequences are not worth subcommand, then let's not give them subcommand and just wait until someone comes with actual reason to have one. But currval() is quite noticeable thing that DISCARD ALL should clear. Or, if you'd rather a more direct answer: wanting this sounds like evidence of bad application design. Why is your app dependent on getting failures from currval, and isn't there a better way to do it? It just does not sound like, but thats exactly the request - because DISCARD ALL leaves user-visible state around, it's hard to fix application that depends on broken assumptions. In fact, it was surprise to me that currval() works across transactions. My alternative proposal would be to get rid of such silly behaviour... -- marko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
On Wed, Apr 17, 2013 at 01:59:12PM -0700, Jeff Davis wrote: On Wed, 2013-04-17 at 12:42 -0400, Bruce Momjian wrote: AFAIK, there's currently no per-page checksum flag. Still, being only able to go from checksummed to not-checksummed probably is for all practical purposes the same as not being able to pg_upgrade at all. Otherwise, why would people have enabled checksums in the first place? Good point, but it is _an_ option, at least. I would like to know the answer of how an upgrade from checksum to no-checksum would behave so I can modify pg_upgrade to allow it. Why? 9.3 pg_upgrade certainly doesn't need it. When we get to 9.4, if someone has checksums enabled and wants to disable it, why is pg_upgrade the right time to do that? Wouldn't it make more sense to allow them to do that at any time? Well, right now, pg_upgrade is the only way you could potentially turn off checksums. You are right that we might eventually want a command, but my point is that we currently have a limitation in pg_upgrade that might not be necessary. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
On Wed, Apr 17, 2013 at 6:54 PM, Florian Pflug f...@phlo.org wrote: On Apr17, 2013, at 16:47 , Ants Aasma a...@cybertec.at wrote: This made me remember, the issue I had was with high order bits, not with low order ones, somehow I got them confused. The exact issue is that the high order bits don't affect any bit lower than them. It's easy to see that if you remember the shift and add nature of multiply. Unfortunately XOR will not fix that. Neither will adding an offset basis. This is the fundamental thing that is behind the not-so-great uncorrelated bit error detection rate. Right. We could maybe fix that by extending the update step to tmp = s[j] ^ d[i,j] s[j] = (t * PRIME) ^ (t 1) or something like that. Shifting t instead of (t * PRIME) should help to reduce the performance impact, since a reordering CPU should be able to parallelize the multiple and the shift. Note though that I haven't really though that through extensively - the general idea should be sound, but whether 1 is a good shifting amount I do not know. I was thinking about something similar too. The big issue here is that the parallel checksums already hide each other latencies effectively executing one each of movdqu/pmullw/paddw each cycle, that's why the N_SUMS adds up to 128 bytes not 16 bytes. I went ahead and coded up both the parallel FNV-1a and parallel FNV-1a + srl1-xor variants and ran performance tests and detection rate tests on both. Performance results: Mul-add checksums: 12.9 bytes/s FNV-1a checksums: 13.5 bytes/s FNV-1a + srl-1: 7.4 bytes/s Detection rates: False positive rates: Add-mul FNV-1a FNV-1a + srl-1 Single bit flip: 1:inf 1:129590 1:64795 Double bit flip: 1:148 1:511 1:53083 Triple bit flip: 1:673 1:5060 1:61511 Quad bit flip: 1:18721:193491:68320 Write 0x00 byte: 1:774538137 1:118776 1:68952 Write 0xFF byte: 1:165399500 1:137489 1:68958 Partial write: 1:59949 1:719391:89923 Write garbage: 1:64866 1:649801:67732 Write run of 00: 1:57077 1:611401:59723 Write run of FF: 1:63085 1:596091:62977 Test descriptions: N bit flip: picks N random non-overlapping bits and flips their value. Write X byte: overwrites a single byte with X. Partial write: picks a random cut point, overwrites everything from there to end with 0x00. Write garbage/run of X: picks two random cut points and fills everything in between with random values/X bytes. So adding in the shifted value nearly cuts the performance in half. I think that by playing with the instruction order I might coax the CPU scheduler to schedule the instructions better, but even in best case it will be somewhat slower. The point to keep in mind that even this slower speed is still faster than hardware accelerated CRC32, so all in all the hit might not be so bad. The effect on false positive rates for double bit errors is particularly impressive. I'm now running a testrun that shift right by 13 to see how that works out, intuitively it should help dispersing the bits a lot faster. I wonder if we use 32bit FNV-1a's (the h = (h^v)*p variant) with different offset-basis values, would it be enough to just XOR fold the resulting values together. The algorithm looking like this: Hm, this will make the algorithm less resilient to some particular input permutations (e.g. those which swap the 64*i-th and the (64+1)-ith words), but those seem very unlikely to occur randomly. But if we're worried about that, we could use your linear combination method for the aggregation phase. I don't think it significantly reduces resilience to permutations thanks to using different basis offsets and multiply not distributing over xor. Speaking against this option is the fact that we will need to do CPU detection at startup to make it fast on the x86 that support SSE4.1, and the fact that AMD CPUs before 2011 will run it an order of magnitude slower (but still faster than the best CRC). Hm, CPU detection isn't that hard, and given the speed at which Intel currently invents new instructions we'll end up going that route sooner or later anyway, I think. Sure it's not that hard but it does have an order of magnitude more design decisions than #if defined(__x86_64__). Maybe a first stab could avoid a generic infrastructure and just have the checksum function as a function pointer, with the default trampoline implementation running a cpuid and overwriting the function pointer with either the optimized or generic versions and then calling it. Any opinions if it would be a reasonable tradeoff to have a better checksum with great performance on latest x86 CPUs and good performance on other architectures at the expense of having only ok performance on older AMD CPUs? The loss on AMD is offset by the increased performance on machines where we can't vectorize, I'd say. +1 Old AMD machines won't soon be used by anyone caring
Re: [HACKERS] [GENERAL] currval and DISCARD ALL
Marko Kreen mark...@gmail.com writes: On Tue, Apr 16, 2013 at 05:09:19PM -0400, Tom Lane wrote: Bruce Momjian br...@momjian.us writes: I think his point is why don't we clear currval() on DISCARD ALL? I can't think of a good reason we don't. Because we'd have to invent a new suboperation DISCARD SEQUENCES, for one thing, in order to be consistent. I'd rather ask why it's important that we should throw away such state. It doesn't seem to me to be important enough to justify a new subcommand. consistency is a philosophical thing. No, it's a critical tool in complexity management. When you're dealing with systems as complicated as a database, every little non-orthogonal detail adds up. DISCARD ALL has a clear definition in terms of simpler commands, and it's going to stay that way. Either this is worth a subcommand, or it's not worth worrying about at all. But currval() is quite noticeable thing that DISCARD ALL should clear. If it were as obvious and noticeable as all that, somebody would have noticed before now. We've had DISCARD ALL with its current meaning since 8.3, and nobody complained in the five-plus years since that shipped. At this point, even if a concrete case were made why DISCARD ALL should clear currval (and I repeat that no credible case has been made; nobody has for example pointed to a reasonably-well-designed application that this breaks), there would be a pretty strong backwards-compatibility argument not to change it. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
Ants Aasma a...@cybertec.at writes: I was thinking about something similar too. The big issue here is that the parallel checksums already hide each other latencies effectively executing one each of movdqu/pmullw/paddw each cycle, that's why the N_SUMS adds up to 128 bytes not 16 bytes. The more I read of this thread, the more unhappy I get. It appears that the entire design process is being driven by micro-optimization for CPUs being built by Intel in 2013. That ought to be, at best, a fifth-order consideration, with full recognition that it'll be obsolete in two years, and is already irrelevant to anyone not running one of those CPUs. I would like to ban all discussion of assembly-language optimizations until after 9.3 is out, so that we can concentrate on what actually matters. Which IMO is mostly the error detection rate and the probable nature of false successes. I'm glad to see that you're paying at least some attention to that, but the priorities in this discussion are completely backwards. And I reiterate that there is theory out there about the error detection capabilities of CRCs. I'm not seeing any theory here, which leaves me with very little confidence that we know what we're doing. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Add \ns command to psql
On Tue, Apr 16, 2013 at 5:40 AM, Colin 't Hart co...@sharpheart.org wrote: Here's a new version of a small patch to psql I'm using locally. It adds a command \ns to psql which is a shortcut to set the SEARCH_PATH variable. I've also added tab completion making this command much more useful. I don't think tab completition would be possible if this command was defined as a variable (which was another suggestion offered at the time). It's possible that the tab completion argument is a sufficient reason for including this, but I'm kinda skeptical. The amount of typing saved is pretty minimal, considering that set seatab completes to set search_path. Assuming we had proper tab completion for set search_path = (and off-hand, it doesn't look like that does anything useful), this would be saving 5 keystrokes every time you want to change the search path (set seatab is eight keystrokes, where \nsspace is four... but it also saves you the semicolon at the end). I'm sure some people would find that worthwhile, but personally, I don't. Short commands are cryptic, and IMHO psql is already an impenetrable thicket of difficult-to-remember abbreviations. I've been using it for more than 10 years now and I still have to to run \? on a semi-regular basis. I think that if we start adding things like this, that help message is going to rapidly fill up with a whole lot more abbreviations for things that are quite a bit incrementally less useful than what's there right now. After all, if we're going to have \ns to set the search path, why not have something similar for work_mem, or random_page_cost? I set both of those variables more often than I set search_path; and there could easily be someone else out there whose favorite GUC is client_encoding or whatever. And, for that matter, why stop with GUCs? \ct for CREATE TABLE would save lots of typing, too -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
On Apr18, 2013, at 00:32 , Tom Lane t...@sss.pgh.pa.us wrote: Ants Aasma a...@cybertec.at writes: I was thinking about something similar too. The big issue here is that the parallel checksums already hide each other latencies effectively executing one each of movdqu/pmullw/paddw each cycle, that's why the N_SUMS adds up to 128 bytes not 16 bytes. The more I read of this thread, the more unhappy I get. It appears that the entire design process is being driven by micro-optimization for CPUs being built by Intel in 2013. That ought to be, at best, a fifth-order consideration, with full recognition that it'll be obsolete in two years, and is already irrelevant to anyone not running one of those CPUs. Micro-optimization for particular CPUs yes, but general performance considerations, no. For example, 2^n is probably one of the worst modulus you can pick for a hash function - any prime would work much better. But doing the computations modulo 2^16 or 2^32 carries zero performance overhead, whereas picking another modulus requires some renormalization after every operation. That, however, is *not* a given - it stems from the fact nearly all CPUs in existence operated on binary integers. This fact must thus enter into the design phase very early, and makes 2^16 or 2^32 a sensible choice for a modulus *despite* it's shortcomings, simply because it allows for fast implementations. I would like to ban all discussion of assembly-language optimizations until after 9.3 is out, so that we can concentrate on what actually matters. Which IMO is mostly the error detection rate and the probable nature of false successes. I'm glad to see that you're paying at least some attention to that, but the priorities in this discussion are completely backwards. I'd say lots of attention is paid to that, but there's *also* attention paid to speed. Which I good, because ideally we want to end up with a checksum with both has good error-detection properties *and* good performance. If performance is of no concern to us, then there's little reason not to use CRC… And I reiterate that there is theory out there about the error detection capabilities of CRCs. I'm not seeing any theory here, which leaves me with very little confidence that we know what we're doing. If you've got any pointers to literature on error-detection capabilities of CPU-friendly checksum functions, please share. I am aware of the vast literature on CRC, and also on some other algebraic approaches, but none of those even come close to the speed of FNV+shift (unless there's a special CRC instruction, that is). And there's also a ton of stuff on cryptographic hashing, but those are optimized for a completely different use-case... best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
On Apr17, 2013, at 23:44 , Ants Aasma a...@cybertec.at wrote: Performance results: Mul-add checksums: 12.9 bytes/s FNV-1a checksums: 13.5 bytes/s FNV-1a + srl-1: 7.4 bytes/s Detection rates: False positive rates: Add-mul FNV-1a FNV-1a + srl-1 Single bit flip: 1:inf 1:129590 1:64795 Double bit flip: 1:148 1:511 1:53083 Triple bit flip: 1:673 1:5060 1:61511 Quad bit flip: 1:18721:193491:68320 Write 0x00 byte: 1:774538137 1:118776 1:68952 Write 0xFF byte: 1:165399500 1:137489 1:68958 Partial write: 1:59949 1:719391:89923 Write garbage: 1:64866 1:649801:67732 Write run of 00: 1:57077 1:611401:59723 Write run of FF: 1:63085 1:596091:62977 Test descriptions: N bit flip: picks N random non-overlapping bits and flips their value. Write X byte: overwrites a single byte with X. Partial write: picks a random cut point, overwrites everything from there to end with 0x00. Write garbage/run of X: picks two random cut points and fills everything in between with random values/X bytes. Cool, thanks for testing that! The results for FNV-1a + srl-1 look promising, I think. Its failure rate is consistently about 1:2^16, which is the value you'd expect. That gives me some confidence that the additional shift as working as expected. BTW, which prime are you using for FNV-1a and FNV-1a+srl1? So adding in the shifted value nearly cuts the performance in half. I think that by playing with the instruction order I might coax the CPU scheduler to schedule the instructions better, but even in best case it will be somewhat slower. The point to keep in mind that even this slower speed is still faster than hardware accelerated CRC32, so all in all the hit might not be so bad. Yeah. ~7 bytes/cycle still translates to over 10GB/s on typical CPU, so that's still plenty fast I'd say... The effect on false positive rates for double bit errors is particularly impressive. I'm now running a testrun that shift right by 13 to see how that works out, intuitively it should help dispersing the bits a lot faster. Maybe, but it also makes *only* bits 14 and 15 actually affects bits below them, because all other's are shifted out. If you choose the right prime it may still work, you'd have to pick one which with enough lower bits set so that every bits affects bit 14 or 15 at some point… All in all a small shift seems better to me - if 1 for some reason isn't a good choice, I'd expect 3 or so to be a suitable replacement, but nothing much larger… I should have some time tomorrow to spent on this, and will try to validate our FNV-1a modification, and see if I find a way to judge whether 1 is a good shift. I wonder if we use 32bit FNV-1a's (the h = (h^v)*p variant) with different offset-basis values, would it be enough to just XOR fold the resulting values together. The algorithm looking like this: Hm, this will make the algorithm less resilient to some particular input permutations (e.g. those which swap the 64*i-th and the (64+1)-ith words), but those seem very unlikely to occur randomly. But if we're worried about that, we could use your linear combination method for the aggregation phase. I don't think it significantly reduces resilience to permutations thanks to using different basis offsets and multiply not distributing over xor. Oh, yeah, I though you were still using 0 as base offset. If you don't, the objection is moot. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
On Thu, Apr 18, 2013 at 1:32 AM, Tom Lane t...@sss.pgh.pa.us wrote: Ants Aasma a...@cybertec.at writes: I was thinking about something similar too. The big issue here is that the parallel checksums already hide each other latencies effectively executing one each of movdqu/pmullw/paddw each cycle, that's why the N_SUMS adds up to 128 bytes not 16 bytes. The more I read of this thread, the more unhappy I get. It appears that the entire design process is being driven by micro-optimization for CPUs being built by Intel in 2013. That ought to be, at best, a fifth-order consideration, with full recognition that it'll be obsolete in two years, and is already irrelevant to anyone not running one of those CPUs. The large scale structure takes into account the trends in computer architecture. A lot more so than using anything straight out of the literature. Specifically, computer architectures have hit a wall in terms of sequential throughput, so the linear dependency chain in the checksum algorithm will be the bottleneck soon if it isn't already. From that it follows that a fast and future proof algorithm should not calculate the checksum in a single log chain. The proposed algorithms divide the input into 64x64 and 32x64 chunks. It's easy to show that both convert the dependency chain from O(n) to O(sqrt(n)). Secondly, unless we pick something really popular, CPUs are unlikely to provide specifically for us, so the algorithm should be built from general purpose computational pieces. Vector integer multiply and xor are pretty much guaranteed to be there and fast on future CPUs. In my view it's much more probable to be available and fast on future CPU's than something like the Intel CRC32 acceleration. I would like to ban all discussion of assembly-language optimizations until after 9.3 is out, so that we can concentrate on what actually matters. Which IMO is mostly the error detection rate and the probable nature of false successes. I'm glad to see that you're paying at least some attention to that, but the priorities in this discussion are completely backwards. I approached it from the angle that what needs to be done to get a fundamentally fast approach have a good enough error detection rate and not have a way of generating false positives that will give a likely error. The algorithms are simple enough and well studied enough that the rewards from tweaking them are negligible. I think the resulting performance speaks for itself. Now the question is what is a good enough algorithm. In my view, the checksum is more like a canary in the coal mine, not something that can be relied upon, and so ultimate efficiency is not that important if there are no obvious horrible cases. I can see that there are other views and so am exploring different tradeoffs between performance and quality. And I reiterate that there is theory out there about the error detection capabilities of CRCs. I'm not seeing any theory here, which leaves me with very little confidence that we know what we're doing. I haven't found much literature that is of use here. There is theory underlying here coming from basic number theory and distilled into rules for hash functions. For the FNV hash the prime supposedly is carefully chosen, although all literature so far is saying it is a good choice, but here is not the place to explain why. Regards, Ants Aasma -- Cybertec Schönig Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
On Thu, Apr 18, 2013 at 2:25 AM, Florian Pflug f...@phlo.org wrote: On Apr17, 2013, at 23:44 , Ants Aasma a...@cybertec.at wrote: Performance results: Mul-add checksums: 12.9 bytes/s FNV-1a checksums: 13.5 bytes/s FNV-1a + srl-1: 7.4 bytes/s Detection rates: False positive rates: Add-mul FNV-1a FNV-1a + srl-1 Single bit flip: 1:inf 1:129590 1:64795 Double bit flip: 1:148 1:511 1:53083 Triple bit flip: 1:673 1:5060 1:61511 Quad bit flip: 1:18721:193491:68320 Write 0x00 byte: 1:774538137 1:118776 1:68952 Write 0xFF byte: 1:165399500 1:137489 1:68958 Partial write: 1:59949 1:719391:89923 Write garbage: 1:64866 1:649801:67732 Write run of 00: 1:57077 1:611401:59723 Write run of FF: 1:63085 1:596091:62977 Test descriptions: N bit flip: picks N random non-overlapping bits and flips their value. Write X byte: overwrites a single byte with X. Partial write: picks a random cut point, overwrites everything from there to end with 0x00. Write garbage/run of X: picks two random cut points and fills everything in between with random values/X bytes. Cool, thanks for testing that! The results for FNV-1a + srl-1 look promising, I think. Its failure rate is consistently about 1:2^16, which is the value you'd expect. That gives me some confidence that the additional shift as working as expected. BTW, which prime are you using for FNV-1a and FNV-1a+srl1? The official 32bit FNV one, 16777619. Offsets were just random numbers. Seems good enough given the following from the FNV page: These non-zero integers are the FNV-0 hashes of the following 32 octets: chongo Landon Curt Noll /\../\ The effect on false positive rates for double bit errors is particularly impressive. I'm now running a testrun that shift right by 13 to see how that works out, intuitively it should help dispersing the bits a lot faster. Empirical results are slightly better with shift of 13: Single bit flip: 1:61615 Double bit flip: 1:58078 Triple bit flip: 1:66329 Quad bit flip: 1:62141 Write 0x00 byte: 1:66327 Write 0xFF byte: 1:65274 Partial write: 1:71939 Write garbage: 1:65095 Write run of 0: 1:62845 Write run of FF: 1:64638 Maybe, but it also makes *only* bits 14 and 15 actually affects bits below them, because all other's are shifted out. If you choose the right prime it may still work, you'd have to pick one which with enough lower bits set so that every bits affects bit 14 or 15 at some point… All in all a small shift seems better to me - if 1 for some reason isn't a good choice, I'd expect 3 or so to be a suitable replacement, but nothing much larger… I don't think the big shift is a problem, the other bits were taken into account by the multiply, and with the larger shift the next multiplication will disperse the changes once again. Nevertheless, I'm running the tests with shift of 3 now. I should have some time tomorrow to spent on this, and will try to validate our FNV-1a modification, and see if I find a way to judge whether 1 is a good shift. Great. I will spend some brain cycles on it too. Regards, Ants Aasma -- Cybertec Schönig Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
On 4/17/13 6:32 PM, Tom Lane wrote: The more I read of this thread, the more unhappy I get. It appears that the entire design process is being driven by micro-optimization for CPUs being built by Intel in 2013. And that's not going to get anyone past review, since all the tests I've been doing the last two weeks are on how fast an AMD Opteron 6234 with OS cache shared_buffers can run this. The main thing I'm still worried about is what happens when you have a fast machine that can move memory around very quickly and an in-memory workload, but it's hamstrung by the checksum computation--and it's not a 2013 Intel machine. The question I started with here was answered to some depth and then skipped past. I'd like to jerk attention back to that, since I thought some good answers from Ants went by. Is there a simple way to optimize the committed CRC computation (or a similar one with the same error detection properties) based on either: a) Knowing that the input will be a 8K page, rather than the existing use case with an arbitrary sized WAL section. b) Straightforward code rearrangement or optimization flags. That was all I thought was still feasible to consider changing for 9.3 a few weeks ago. And the possible scope has only been shrinking since then. And I reiterate that there is theory out there about the error detection capabilities of CRCs. I'm not seeing any theory here, which leaves me with very little confidence that we know what we're doing. Let me see if I can summarize where the messages flying by are at since you'd like to close this topic for now: -Original checksum feature used Fletcher checksums. Its main problems, to quote wikipedia, include that it cannot distinguish between blocks of all 0 bits and blocks of all 1 bits. -Committed checksum feature uses truncated CRC-32. This has known good error detection properties, but is expensive to compute. There's reason to believe that particular computation will become cheaper on future platforms though. But taking full advantage of that will require adding CPU-specific code to the database. -The latest idea is using the Fowler–Noll–Vo hash function: https://en.wikipedia.org/wiki/Fowler_Noll_Vo_hash There's 20 years of research around when that is good or bad. The exactly properties depend on magic FNV primes: http://isthe.com/chongo/tech/comp/fnv/#fnv-prime that can vary based on both your target block size and how many bytes you'll process at a time. For PostgreSQL checksums, one of the common problems--getting an even distribution of the hashed values--isn't important the way it is for other types of hashes. Ants and Florian have now dug into how exactly that and specific CPU optimization concerns impact the best approach for 8K database pages. This is very clearly a 9.4 project that is just getting started. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
On Thu, Apr 18, 2013 at 3:21 AM, Greg Smith g...@2ndquadrant.com wrote: On 4/17/13 6:32 PM, Tom Lane wrote: The more I read of this thread, the more unhappy I get. It appears that the entire design process is being driven by micro-optimization for CPUs being built by Intel in 2013. And that's not going to get anyone past review, since all the tests I've been doing the last two weeks are on how fast an AMD Opteron 6234 with OS cache shared_buffers can run this. The main thing I'm still worried about is what happens when you have a fast machine that can move memory around very quickly and an in-memory workload, but it's hamstrung by the checksum computation--and it's not a 2013 Intel machine. The question I started with here was answered to some depth and then skipped past. I'd like to jerk attention back to that, since I thought some good answers from Ants went by. Is there a simple way to optimize the committed CRC computation (or a similar one with the same error detection properties) based on either: a) Knowing that the input will be a 8K page, rather than the existing use case with an arbitrary sized WAL section. b) Straightforward code rearrangement or optimization flags. That was all I thought was still feasible to consider changing for 9.3 a few weeks ago. And the possible scope has only been shrinking since then. Nothing from the two points, but the CRC calculation algorithm can be switched out for slice-by-4 or slice-by-8 variant. Speed up was around factor of 4 if I remember correctly. And I reiterate that there is theory out there about the error detection capabilities of CRCs. I'm not seeing any theory here, which leaves me with very little confidence that we know what we're doing. Let me see if I can summarize where the messages flying by are at since you'd like to close this topic for now: -Original checksum feature used Fletcher checksums. Its main problems, to quote wikipedia, include that it cannot distinguish between blocks of all 0 bits and blocks of all 1 bits. That was only the most glaring problem. -Committed checksum feature uses truncated CRC-32. This has known good error detection properties, but is expensive to compute. There's reason to believe that particular computation will become cheaper on future platforms though. But taking full advantage of that will require adding CPU-specific code to the database. Actually the state is that with the polynomial used there is currently close to zero hope of CPUs optimizing for us. By switching the polynomial we can have hardware acceleration on Intel CPUs, little hope of others supporting given that AMD hasn't by now and Intel touts around patents in this area. However the calculation can be made about factor of 4 faster by restructuring the calculation. This optimization is plain C and not CPU specific. The committed checksum is an order of magnitude slower than the Fletcher one that was performance tested with the patch. -The latest idea is using the Fowler–Noll–Vo hash function: https://en.wikipedia.org/wiki/Fowler_Noll_Vo_hash There's 20 years of research around when that is good or bad. The exactly properties depend on magic FNV primes: http://isthe.com/chongo/tech/comp/fnv/#fnv-prime that can vary based on both your target block size and how many bytes you'll process at a time. For PostgreSQL checksums, one of the common problems--getting an even distribution of the hashed values--isn't important the way it is for other types of hashes. Ants and Florian have now dug into how exactly that and specific CPU optimization concerns impact the best approach for 8K database pages. This is very clearly a 9.4 project that is just getting started. I'm not sure about the 9.4 part: if we ship with the builtin CRC as committed, there is a 100% chance that we will want to switch out the algorithm in 9.4, and there will be quite a large subset of users that will find the performance unusable. If we change it to whatever we come up with here, there is a small chance that the algorithm will give worse than expected error detection rate in some circumstances and we will want offer a better algorithm. More probably it will be good enough and the low performance hit will allow more users to turn it on. This is a 16bit checksum that we talking about, not SHA-1, it is expected to occasionally fail to detect errors. I can provide you with a patch of the generic version of any of the discussed algorithms within an hour, leaving plenty of time in beta or in 9.4 to accommodate the optimized versions. It's literally a dozen self contained lines of code. Regards, Ants Aasma -- Cybertec Schönig Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
On Wed, Apr 17, 2013 at 5:21 PM, Greg Smith g...@2ndquadrant.com wrote: Let me see if I can summarize where the messages flying by are at since you'd like to close this topic for now: -Original checksum feature used Fletcher checksums. Its main problems, to quote wikipedia, include that it cannot distinguish between blocks of all 0 bits and blocks of all 1 bits. -Committed checksum feature uses truncated CRC-32. This has known good error detection properties, but is expensive to compute. There's reason to believe that particular computation will become cheaper on future platforms though. But taking full advantage of that will require adding CPU-specific code to the database. -The latest idea is using the Fowler–Noll–Vo hash function: https://en.wikipedia.org/wiki/Fowler_Noll_Vo_hash There's 20 years of research around when that is good or bad. The exactly properties depend on magic FNV primes: http://isthe.com/chongo/tech/comp/fnv/#fnv-prime that can vary based on both your target block size and how many bytes you'll process at a time. For PostgreSQL checksums, one of the common problems--getting an even distribution of the hashed values--isn't important the way it is for other types of hashes. Ants and Florian have now dug into how exactly that and specific CPU optimization concerns impact the best approach for 8K database pages. This is very clearly a 9.4 project that is just getting started. I was curious about the activity in this thread and wanted to understand the tradeoffs, and came to the same understanding as you when poking around. It seems the tough aspect of the equation is that the most well studied thing is slow (CRC-32C) unless you have special ISA support Trying to find as much information and conclusive research on FNV was a lot more challenging. Fletcher is similar in that regard. Given my hasty attempt to understand each of the alternatives, my qualitative judgement is that, strangely enough, the most conservative choice of the three (in terms of being understood and treated in the literature more than ten times over) is CRC-32C, but it's also the one being cast as only suitable inside micro-optimization. To add another, theoretically-oriented dimension to the discussion, I'd like suggest it's also the most thoroughly studied of all the alternatives. I really had a hard time finding follow-up papers about the two alternatives, but to be fair, I didn't try very hard...then again, I didn't try very hard for any of the three, it's just that CRC32C was by far the easiest find materials on. The original paper is often shorthanded Castagnoli 93, but it exists in the IEEE's sphere of influence and is hard to find a copy of. Luckily, a pretty interesting survey paper discussing some of the issues was written by Koopman in 2002 and is available: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.11.8323 As a pedagolgical note, it's pretty interesting and accessible piece of writing (for me, as someone who knows little of error detection/correction) and explains some of the engineering reasons that provoke such exercises. Basically...if it comes down to understand what the heck is going on and what the trade-offs are, it was a lot easier to brush up on CRC32-C in my meandering around the Internet. One might think this level of scrutiny would constitute a viable explanation of why CRC32C found its way into several standards and then finally in silicon. All in all, if the real world costs of CRC32C on not-SSE4.2 are allowable, I think it's the most researched and and conservative option, although perhaps some of the other polynomials seen in Koopman could also be desirable. It seems there's a tradeoff in CRC polynomials between long-message and short-message error detection, and the paper above may allow for a more informed selection. CRC32C is considered a good trade-off for both, but I haven't assessed the paper in enough detail to suggest whether there are specialized long-run polynomials that may be better still (although, then, there is also the microoptimization question, which postdates the literature I was looking at by a lot). -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] confusing message about archive failures
When archive_command fails three times, it prints this message into the logs: transaction log file \%s\ could not be archived: too many failures This leaves it open what happens next. What will actually happen is that it will usually try again after 60 seconds or so, but the message indicates something much more fatal than that. Could we rephrase this a little bit to make it less dramatic, like ... too many failures, will try again later ? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
On 4/17/13 8:56 PM, Ants Aasma wrote: Nothing from the two points, but the CRC calculation algorithm can be switched out for slice-by-4 or slice-by-8 variant. Speed up was around factor of 4 if I remember correctly...I can provide you with a patch of the generic version of any of the discussed algorithms within an hour, leaving plenty of time in beta or in 9.4 to accommodate the optimized versions. Can you nail down a solid, potential for commit slice-by-4 or slice-by-8 patch then? You dropped into things like per-byte overhead to reach this conclusion, which was fine to let the methods battle each other. Maybe I missed it, but I didn't remember seeing an obvious full patch for this implementation then come back up from that. With the schedule pressure this needs to return to more database-level tests. Your concerns about the committed feature being much slower then the original Fletcher one are troubling, and we might as well do that showdown again now with the best of the CRC implementations you've found. Actually the state is that with the [CRC] polynomial used there is currently close to zero hope of CPUs optimizing for us. Ah, I didn't catch that before. It sounds like the alternate slicing implementation should also use a different polynomial then, which sounds reasonable. This doesn't even have to be exactly the same CRC function that the WAL uses. A CRC that's modified for performance or having a better future potential is fine; there's just a lot of resistance to using something other than a CRC right now. I'm not sure about the 9.4 part: if we ship with the builtin CRC as committed, there is a 100% chance that we will want to switch out the algorithm in 9.4, and there will be quite a large subset of users that will find the performance unusable. Now I have to switch out my reviewer hat for my 3 bit fortune telling one. (It uses a Magic 8 Ball) This entire approach is squeezing what people would prefer to be a 32 bit CRC into a spare 16 bits, as a useful step advancing toward a long term goal. I have four major branches of possible futures here I've thought about: 1) Database checksums with 16 bits are good enough, but they have to be much faster to satisfy users. It may take a different checksum implementation altogether to make that possible, and distinguishing between the two of them requires borrowing even more metadata bits from somewhere. (This seems the future you're worried about) 2) Database checksums work out well, but they have to be 32 bits to satisfy users and/or error detection needs. Work on pg_upgrade and expanding the page headers will be needed. Optimization of the CRC now has a full 32 bit target. 3) The demand for database checksums is made obsolete by either mainstream filesystem checksumming, performance issues, or just general market whim. The 16 bit checksum PostgreSQL implements becomes a vestigial feature, and whenever it gets in the way of making changes someone proposes eliminating them. (I call this one the rules future) 4) 16 bit checksums turn out to be such a problem in the field that everyone regrets the whole thing, and discussions turn immediately toward how to eliminate that risk. It's fair that you're very concerned about (1), but I wouldn't give it 100% odds of happening either. The user demand that's motivated me to work on this will be happy with any of (1) through (3), and in two of them optimizing the 16 bit checksums now turns out to be premature. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] confusing message about archive failures
On Wednesday, April 17, 2013, Peter Eisentraut wrote: When archive_command fails three times, it prints this message into the logs: transaction log file \%s\ could not be archived: too many failures This leaves it open what happens next. What will actually happen is that it will usually try again after 60 seconds or so, but the message indicates something much more fatal than that. Could we rephrase this a little bit to make it less dramatic, like ... too many failures, will try again later ? +1 I've found the current message alarming/confusing as well. But I don't really understand the logic behind bursting the attempts, 3 of them one second apart, then sleeping 57 seconds, in the first place. Cheers, Jeff
[HACKERS] Word-level bigrams/trigrams in tsvector
I'm wondering how I can store word-level bigrams/trigrams in a tsvector that I can query against. I was expecting the final query to match the air and return the one tuple to me. For instance: postgres=# create table docs (a tsvector); CREATE TABLE postgres=# insert into docs (a) values (strip('''the air'' smells ''sea water'''::tsvector)); INSERT 0 1 postgres=# select * from docs; a 'sea water' 'smells' 'the air' (1 row) postgres=# select * from docs where a @@ to_tsquery('''the air'''); a --- (0 rows) Thanks, Alan