date:20130417

Re: [HACKERS] event trigger API documentation?

2013-04-17 Thread Dimitri Fontaine

Peter Eisentraut pete...@gmx.net writes:
 I'm specifically looking for C API documentation, along the lines of
 http://www.postgresql.org/docs/devel/static/trigger-interface.html.

 The current chapter on event triggers might as well be ripped out and
 folded into the CREATE EVENT TRIGGER reference page, because it explains
 nothing about programming those triggers.

I'm not sure about ripping it out, it does not sound like a good idea to
me. It needs some addition and C level examples yes. The plan was to
build a contrib module as an example, that would cancel any (supported)
command you try to run by means of ereport(ERROR, …);. Then add that in
pieces in the docs with details about what's going on.

While the commit fest was still running didn't look like the right time
to work on that. Beta looks like when to be working on that.

What do you think about the proposal here?

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Inconsistent DB data in Streaming Replication

2013-04-17 Thread Amit Kapila

On Monday, April 15, 2013 1:02 PM Florian Pflug wrote:
 On Apr14, 2013, at 17:56 , Fujii Masao masao.fu...@gmail.com wrote:
  At fast shutdown, after walsender sends the checkpoint record and
  closes the replication connection, walreceiver can detect the close
  of connection before receiving all WAL records. This means that,
  even if walsender sends all WAL records, walreceiver cannot always
  receive all of them.
 
 That sounds like a bug in walreceiver to me.
 
 The following code in walreceiver's main loop looks suspicious:
 
   /*
* Process the received data, and any subsequent data we
* can read without blocking.
*/
   for (;;)
   {
 if (len  0)
 {
   /* Something was received from master, so reset timeout */
   ...
   XLogWalRcvProcessMsg(buf[0], buf[1], len - 1);
 }
 else if (len == 0)
   break;
 else if (len  0)
 {
   ereport(LOG,
   (errmsg(replication terminated by primary server),
errdetail(End of WAL reached on timeline %u at %X/%X,
  startpointTLI,
  (uint32) (LogstreamResult.Write  32),
  (uint32) LogstreamResult.Write)));
   ...
 }
 len = walrcv_receive(0, buf);
   }
 
   /* Let the master know that we received some data. */
   XLogWalRcvSendReply(false, false);
 
   /*
* If we've written some records, flush them to disk and
* let the startup process and primary server know about
* them.
*/
   XLogWalRcvFlush(false);
 
 The loop at the top looks fine - it specifically avoids throwing
 an error on EOF. But the code then proceeds to XLogWalRcvSendReply()
 which doesn't seem to have the same smarts - it simply does
 
   if (PQputCopyData(streamConn, buffer, nbytes) = 0 ||
   PQflush(streamConn))
   ereport(ERROR,
   (errmsg(could not send data to WAL stream: %s,
   PQerrorMessage(streamConn;
 
 Unless I'm missing something, that certainly seems to explain
 how a standby can lag behind even after a controlled shutdown of
 the master.

Do you mean to say that as an error has occurred, so it would not be able to
flush received WAL, which could result in loss of WAL?
I think even if error occurs, it will call flush in WalRcvDie(), before
terminating WALReceiver.

With Regards,
Amit Kapila.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Inconsistent DB data in Streaming Replication

2013-04-17 Thread Florian Pflug

On Apr17, 2013, at 12:22 , Amit Kapila amit.kap...@huawei.com wrote:
 Do you mean to say that as an error has occurred, so it would not be able to
 flush received WAL, which could result in loss of WAL?
 I think even if error occurs, it will call flush in WalRcvDie(), before
 terminating WALReceiver.

Hm, true, but for that to prevent the problem the inner processing
loop needs to always read up to EOF before it exits and we attempt
to send a reply. Which I don't think it necessarily does. Assume,
that the master sends a chunk of data, waits a bit, and finally
sends the shutdown record and exits. The slave might then receive
the first chunk, and it might trigger sending a reply. At the time
the reply is sent, the master has already sent the shutdown record
and closed the connection, and we'll thus fail to reply and abort.
Since the shutdown record has never been read from the socket,
XLogWalRcvFlush won't flush it, and the slave ends up behind the
master.

Also, since XLogWalRcvProcessMsg responds to keep-alives messages,
we might also error out of the inner processing loop if the server
closes the socket after sending a keepalive but before we attempt
to respond.

Fixing this on the receive side alone seems quite messy and fragile.
So instead, I think we should let the master send a shutdown message
after it has sent everything it wants to send, and wait for the client
to acknowledge it before shutting down the socket.

If the client fails to respond, we could log a fat WARNING.

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] TODO links broken?

2013-04-17 Thread Robert Haas

On Wed, Apr 17, 2013 at 12:21 AM, Stephen Scheck
singularsyn...@gmail.com wrote:
 Many of the links in the TODO wiki page result in a page not found error.
 Is this page up-to-date?
 Can anything be inferred about the status of these items from the broken
 link?

I think what we can infer is that the new archives code is broken.  I
hope someone is planning to fix that.  If there's been some decision
made that we don't have to support the historical URLs for our
archives pages, I think that's a really bad plan; those links are in a
lot more places than just the Todo.

As for your actual question, the TODO list is an accumulation of items
that someone, sometime over the last ten years, thought would be
valuable to work on, and nobody objected too strenuously to the idea.
The fact that it's in the TODO list doesn't mean that anyone is
working on it now, that anyone will ever work on it, or that people
would still think it was a good idea if it were re-proposed today.
It's just kind of a list to jump start people's thinking, and
shouldn't be taken as particularly official.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] TODO links broken?

2013-04-17 Thread Magnus Hagander

On Wed, Apr 17, 2013 at 2:13 PM, Robert Haas robertmh...@gmail.com wrote:
 On Wed, Apr 17, 2013 at 12:21 AM, Stephen Scheck
 singularsyn...@gmail.com wrote:
 Many of the links in the TODO wiki page result in a page not found error.
 Is this page up-to-date?
 Can anything be inferred about the status of these items from the broken
 link?

 I think what we can infer is that the new archives code is broken.  I
 hope someone is planning to fix that.  If there's been some decision

Yes. We can infer that. It makes it a whole lot easier to fix
something with better bug repors than that, of course, as I'm sure you
(Robert in this case, not Stephen) are generally aware of.

I've reverted a patch that was applied a few days ago that dealt with
how URLs are parsed, and I think that's the one that's responsible.
But it would be good to have an actual example of what didn't work,
because the links i tried all worked...

 made that we don't have to support the historical URLs for our
 archives pages, I think that's a really bad plan; those links are in a
 lot more places than just the Todo.

No, the plan has always been to support those. There are no plans to
remove that.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Inconsistent DB data in Streaming Replication

2013-04-17 Thread Amit Kapila

On Wednesday, April 17, 2013 4:19 PM Florian Pflug wrote:
 On Apr17, 2013, at 12:22 , Amit Kapila amit.kap...@huawei.com wrote:
  Do you mean to say that as an error has occurred, so it would not be
 able to
  flush received WAL, which could result in loss of WAL?
  I think even if error occurs, it will call flush in WalRcvDie(),
 before
  terminating WALReceiver.
 
 Hm, true, but for that to prevent the problem the inner processing
 loop needs to always read up to EOF before it exits and we attempt
 to send a reply. Which I don't think it necessarily does. Assume,
 that the master sends a chunk of data, waits a bit, and finally
 sends the shutdown record and exits. The slave might then receive
 the first chunk, and it might trigger sending a reply. At the time
 the reply is sent, the master has already sent the shutdown record
 and closed the connection, and we'll thus fail to reply and abort.
 Since the shutdown record has never been read from the socket,
 XLogWalRcvFlush won't flush it, and the slave ends up behind the
 master.
 
 Also, since XLogWalRcvProcessMsg responds to keep-alives messages,
 we might also error out of the inner processing loop if the server
 closes the socket after sending a keepalive but before we attempt
 to respond.
 
 Fixing this on the receive side alone seems quite messy and fragile.
 So instead, I think we should let the master send a shutdown message
 after it has sent everything it wants to send, and wait for the client
 to acknowledge it before shutting down the socket.
 
 If the client fails to respond, we could log a fat WARNING.

Your explanation seems to be okay, but I think before discussing the exact
solution, 
If the actual problem can be reproduced, then it might be better to discuss
this solution.

With Regards,
Amit Kapila.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] TODO links broken?

2013-04-17 Thread Robert Haas

On Wed, Apr 17, 2013 at 8:48 AM, Magnus Hagander mag...@hagander.net wrote:
 Yes. We can infer that. It makes it a whole lot easier to fix
 something with better bug repors than that, of course, as I'm sure you
 (Robert in this case, not Stephen) are generally aware of.

 I've reverted a patch that was applied a few days ago that dealt with
 how URLs are parsed, and I think that's the one that's responsible.
 But it would be good to have an actual example of what didn't work,
 because the links i tried all worked...

Hmm.  Sorry for the lack of detail.  I assumed the problem was obvious
and widespread because I clicked on the first link I saw in the Todo
and it didn't work.  But after clicking a bunch more links from the
Todo, I only found three that fail.

http://archives.postgresql.org/pgsql-hackers/2008-12/msg01340.php
http://archives.postgresql.org/pgsql-hackers/2011-03/msg01831.php
http://www.postgresql.org/message-id/4B577E9F.8000505%40dunslane.net/

That last one works if I change %40 to @, so that one might be a wiki
problem rather than an archives problem.  In fact, for all I know the
other two might have been broken all along too; I'm just assuming they
used to work.

Sorry for going overboard,

...Robert


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] TODO links broken?

2013-04-17 Thread Magnus Hagander

On Wed, Apr 17, 2013 at 3:14 PM, Robert Haas robertmh...@gmail.com wrote:
 On Wed, Apr 17, 2013 at 8:48 AM, Magnus Hagander mag...@hagander.net wrote:
 Yes. We can infer that. It makes it a whole lot easier to fix
 something with better bug repors than that, of course, as I'm sure you
 (Robert in this case, not Stephen) are generally aware of.

 I've reverted a patch that was applied a few days ago that dealt with
 how URLs are parsed, and I think that's the one that's responsible.
 But it would be good to have an actual example of what didn't work,
 because the links i tried all worked...

 Hmm.  Sorry for the lack of detail.  I assumed the problem was obvious
 and widespread because I clicked on the first link I saw in the Todo
 and it didn't work.  But after clicking a bunch more links from the
 Todo, I only found three that fail.

 http://archives.postgresql.org/pgsql-hackers/2008-12/msg01340.php
 http://archives.postgresql.org/pgsql-hackers/2011-03/msg01831.php


Works now, so that seems to have been fixed by the reverting of the
patch. It might be a while before they all recover due to caching
issues, but both of these work now for me, which seems to indcate the
fix is the right one.


 http://www.postgresql.org/message-id/4B577E9F.8000505%40dunslane.net/

It works with %40 for me now, so it might have been related - can you
check if it is still an issue for you? It might be different in
different browsers.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] TODO links broken?

2013-04-17 Thread Robert Haas

On Wed, Apr 17, 2013 at 9:23 AM, Magnus Hagander mag...@hagander.net wrote:
 Hmm.  Sorry for the lack of detail.  I assumed the problem was obvious
 and widespread because I clicked on the first link I saw in the Todo
 and it didn't work.  But after clicking a bunch more links from the
 Todo, I only found three that fail.

 http://archives.postgresql.org/pgsql-hackers/2008-12/msg01340.php
 http://archives.postgresql.org/pgsql-hackers/2011-03/msg01831.php

 Works now, so that seems to have been fixed by the reverting of the
 patch. It might be a while before they all recover due to caching
 issues, but both of these work now for me, which seems to indcate the
 fix is the right one.

 http://www.postgresql.org/message-id/4B577E9F.8000505%40dunslane.net/

 It works with %40 for me now, so it might have been related - can you
 check if it is still an issue for you? It might be different in
 different browsers.

Yeah, it seems OK now.  Thanks for the quick response.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [sepgsql 2/3] Add db_schema:search permission checks

2013-04-17 Thread Robert Haas

On Fri, Apr 12, 2013 at 2:44 PM, Kohei KaiGai kai...@kaigai.gr.jp wrote:
 Yes, of course. The attached one replaces the getObjectDescription in
 sepgsql/proc.c, and relative changes in regression test.

Thanks.  Committed.  I also committed the first two hunks of your
cleanup patch but omitted the third one, which is not well-worded and
seems like overkill anyway.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] event trigger API documentation?

2013-04-17 Thread Tom Lane

Dimitri Fontaine dimi...@2ndquadrant.fr writes:
 I'm not sure about ripping it out, it does not sound like a good idea to
 me. It needs some addition and C level examples yes. The plan was to
 build a contrib module as an example, that would cancel any (supported)
 command you try to run by means of ereport(ERROR, â¦);. Then add that in
 pieces in the docs with details about what's going on.

 While the commit fest was still running didn't look like the right time
 to work on that. Beta looks like when to be working on that.

 What do you think about the proposal here?

We're not adding new contrib modules during beta.  Expanding the
documentation seems like a fine beta-period activity, though.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] erroneous restore into pg_catalog schema

2013-04-17 Thread Robert Haas

On Tue, Jan 29, 2013 at 6:00 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Tue, Jan 29, 2013 at 2:30 PM, Alvaro Herrera
 Robert, are you working on this?

 I wasn't, but I can, if we agree on it.

 I think we need to do *something* (and accordingly have added this to
 the 9.3 open items page so we don't forget about it).  Whether Robert's
 idea is the best one probably depends in part on how clean the patch
 turns out to be.

The attached patch attempts to implement this.  I discovered that, in
fact, we have a number of places in our initdb-time scripts that rely
on the current behavior, but they weren't hard to fix; and in fact I
think the extra verbosity is probably not a bad thing here.

See what you think.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


explicit-pg-catalog.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling Checksums

2013-04-17 Thread Ants Aasma

On Wed, Apr 17, 2013 at 2:26 AM, Florian Pflug f...@phlo.org wrote:
 This raises two question. First, why are there two primes? You could
 just as well using a single prime q and set p=q^64 mod 2^16. You then
 get
  S = sum V[i,j] * q^(64*(64-i) + (64-j)
= sum V[i,j] * q^(4096 - 64*(i-1) - j)
 You get higher prime powers that way, but you can easily choose a prime
 that yields distinct values mod 2^16 for exponents up to 16383. Your
 PRIME2, for example, does. (It wraps around for 16384, i.e.
 PRIME2^16384 = 1 mod 2^16, but that's true for every possible prime since
 16384 is the Carmichael function's value at 2^16)

 The experimental detection rate is about the same if we use a single
 prime. But I think you have the analytical form wrong here. It should
 be given q = p:

S = sum V[i,j] * p^(64-i) * p^(64-j)
  = sum V[i,j] * p^(64 - i + 64 - j)
  = sum V[i,j] * p^(128 - i -j)

 Yeah, if you set q = p that's true. My suggestion was p=q^64 though...

So it was, I guess it was too late here and I missed it... All thing
considered that is a good suggestion, if for nothing else, the generic
implementation can be smaller this way.

 Second, why does it use addition instead of XOR? It seems that FNV
 usually XORs the terms together instead of adding them?

 Testing showed slightly better detection rate for adds. Intuitively I
 think it's because the carry introduces some additional mixing.

 Hm, but OTOH it makes S linear in V, i.e. if you have two inputs
 V1,V2 and V = V1 + V2, then S = S1 + S2. Also, if V' = V*m, then
 S' = S*m. The second property is quite undesirable, I think. Assume
 all the V[i,j] are divisible by 2^k, i.e. have zeros at all bit
 positions 0..(k-1). Then, due to linearity, S is also divisible by
 2^k, i.e. also has no ones before the k-th bit. This means, for example
 that if you hash values values which all have their lowest bit cleared,
 you get only 2^15 distinct hash values. If they all have the two
 lowest bits cleared, you get only 2^14 distinct values, and so on…

 Generally, linearity doesn't seem to be a property that one wants
 in a hash I think, so my suggestion is to stick to XOR.

This made me remember, the issue I had was with high order bits, not
with low order ones, somehow I got them confused. The exact issue is
that the high order bits don't affect any bit lower than them. It's
easy to see that if you remember the shift and add nature of multiply.
Unfortunately XOR will not fix that. Neither will adding an offset
basis. This is the fundamental thing that is behind the not-so-great
uncorrelated bit error detection rate.

While I understand that linearity is not a desirable property, I
couldn't think of a realistic case where it would hurt. I can see how
it can hurt checksums of variable length values, but for our fixed
buffer case it's definitely not so clear cut. On the pro side the
distributive property that is behind linearity allowed me to do final
aggregation in a tree form, performing the multiplies in parallel
instead of linearly. This adds up to the difference between 250 cycles
(64*(3 cycle IMUL + 1 cycle XOR)) and 25 cycles (4*5 cycle pmullw + 5
cycle addw). Given that the main loop is about 576 cycles, this is a
significant difference.

 Here, btw, is a page on FNV hashing. It mentions a few rules for
 picking suitable primes

 http://www.isthe.com/chongo/tech/comp/fnv

 Unfortunately the rules don't apply here because of the hash size.

 Yeah :-(.

 I noticed that their 32-bit prime only has a single one outside
 the first 16 bits. Maybe we can take advantage of that and use a
 32-bit state while still providing decent performance on machines
 without a 32-bit x 32-bit - 32-bit multiply instruction?

Looking at the Power instruction set, a 32bit mul by the FNV prime
would look like this:

vmulouh tmp1, hash, prime
vmladduhm tmp1, hash, prime16
vslw tmp2, hash, 24
vadduwm hash, tmp1, tmp2

That is 4 instructions to multiply 4 values. Depending on the specific
execution ports on the processor it might faster or slower than the
scalar version but not by a whole lot. Main benefit would be that the
intermediate state could be held in registers.

 If we lived in an Intel-only world, I'd suggest going with a
 32-bit state, since SSE4.1 support is *very* wide-spread already -
 the last CPUs without it came out over 5 years ago, I think.
 (Core2 and later support SSE4.1, and some later Core1 do too)

 But unfortunately things look bleak even for other x86
 implementations - AMD support SSE4.1 only starting with
 Bulldozer, which came out 2011 or so I believe. Leaving the x86
 realm, it seems that only ARM's NEON provides the instructions
 we'd need - AltiVec seems to be support only 16-bit multiplies,
 and from what some quick googling brought up, MIPS and SPARC
 SIMD instructions look no better..

 OTOH, chances are that nobody will ever do SIMD implementations
 for those machines. In that case, working in 32-bit chunks instead
 of

Re: [HACKERS] Enabling Checksums

2013-04-17 Thread Bruce Momjian

On Wed, Apr 17, 2013 at 05:47:55PM +0300, Ants Aasma wrote:
 The SSE4.1 implementation of this would be as fast as the last pat,
 generic version will be faster and we avoid the linearity issue. By
 using different offsets for each of the partial hashes we don't
 directly suffer from commutativity of the final xor folding. By using
 the xor-then-multiply variant the last values hashed have their bits
 mixed before folding together.
 
 Speaking against this option is the fact that we will need to do CPU
 detection at startup to make it fast on the x86 that support SSE4.1,
 and the fact that AMD CPUs before 2011 will run it an order of
 magnitude slower (but still faster than the best CRC).
 
 Any opinions if it would be a reasonable tradeoff to have a better
 checksum with great performance on latest x86 CPUs and good
 performance on other architectures at the expense of having only ok
 performance on older AMD CPUs?
 
 Also, any good suggestions where should we do CPU detection when we go
 this route?

As much as I love the idea of improving the algorithm, it is disturbing
we are discussing this so close to beta, with an algorithm that is under
analysis, with no (runtime) CPU detection, and in something that is
going to be embedded into our data page format.  I can't even think of
another case where we do run-time CPU detection.

I am wondering if we need to tell users that pg_upgrade will not be
possible if you enable page-level checksums, so we are not trapped with
something we want to improve in 9.4.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling Checksums

2013-04-17 Thread Florian Pflug

On Apr17, 2013, at 17:09 , Bruce Momjian br...@momjian.us wrote:
 As much as I love the idea of improving the algorithm, it is disturbing
 we are discussing this so close to beta, with an algorithm that is under
 analysis, with no (runtime) CPU detection, and in something that is
 going to be embedded into our data page format.  I can't even think of
 another case where we do run-time CPU detection.

We could still ship the new checksum algorithm with 9.3, but omit the
SSE-optimized version, i.e. include only the plain C implementation.
I think Ants mentioned somehwere that gcc does a pretty good job of
vectorizing that, so people who really care (and who use GCC)
could compile with -msse41 --unrool-loops --tree-vectorize, and get
performance close to that of a hand-coded SSE version.

The important decision we're facing is which algorithm to use. I personally
believe Ants is on the right track there - FNV or a variant thereof
looks like a good choice to me, but the details have yet to be nailed
I think.

However, you're right that time's running out. It'd be a shame though
if we'd lock ourselves into CRC as the only available algorithm essentially
forever.  Is there any way we can change the checksum algorithm in 9.4
*without* breaking pg_upgrade? Maybe pd_pagesize_version could be used
for that - we could make version 5 mean just like version 4, but with
a different checksum algorithm. Since the layout wouldn't actually
chance, that'd be far easier to pull off than actually supporting multiple
page layouts. If that works, then shipping 9.3 with CRC is probably
the best solution. If not, we should see to it that something like Ants
parallel version of FNV or a smallget into 9.3 if at all possible,
IMHO.

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] event trigger API documentation?

2013-04-17 Thread Peter Eisentraut

On 4/17/13 5:41 AM, Dimitri Fontaine wrote:
 I'm not sure about ripping it out, it does not sound like a good idea to
 me. It needs some addition and C level examples yes. The plan was to
 build a contrib module as an example, that would cancel any (supported)
 command you try to run by means of ereport(ERROR, …);. Then add that in
 pieces in the docs with details about what's going on.
 
 While the commit fest was still running didn't look like the right time
 to work on that. Beta looks like when to be working on that.

Well, if documentation had been available well before beta, other
procedural languages might have gained support for event triggers.  If
it's not being documented, it might not happen very soon.

It would have been good to have at least one untrusted language with
event trigger support, so that you can hook in external auditing or
logging systems.  With the existing PL/pgSQL support, the possible
actions are a bit limited.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling Checksums

2013-04-17 Thread Florian Pflug

On Apr17, 2013, at 16:47 , Ants Aasma a...@cybertec.at wrote:
 This made me remember, the issue I had was with high order bits, not
 with low order ones, somehow I got them confused. The exact issue is
 that the high order bits don't affect any bit lower than them. It's
 easy to see that if you remember the shift and add nature of multiply.
 Unfortunately XOR will not fix that. Neither will adding an offset
 basis. This is the fundamental thing that is behind the not-so-great
 uncorrelated bit error detection rate.

Right. We could maybe fix that by extending the update step to

  tmp = s[j] ^ d[i,j]
  s[j] = (t * PRIME) ^ (t  1)

or something like that. Shifting t instead of (t * PRIME) should
help to reduce the performance impact, since a reordering CPU should
be able to parallelize the multiple and the shift. Note though that
I haven't really though that through extensively - the general idea
should be sound, but whether 1 is a good shifting amount I do not
know.

 While I understand that linearity is not a desirable property, I
 couldn't think of a realistic case where it would hurt. I can see how
 it can hurt checksums of variable length values, but for our fixed
 buffer case it's definitely not so clear cut. On the pro side the
 distributive property that is behind linearity allowed me to do final
 aggregation in a tree form, performing the multiplies in parallel
 instead of linearly. This adds up to the difference between 250 cycles
 (64*(3 cycle IMUL + 1 cycle XOR)) and 25 cycles (4*5 cycle pmullw + 5
 cycle addw). Given that the main loop is about 576 cycles, this is a
 significant difference.

 I wonder if we use 32bit FNV-1a's (the h = (h^v)*p variant) with
 different offset-basis values, would it be enough to just XOR fold the
 resulting values together. The algorithm looking like this:

Hm, this will make the algorithm less resilient to some particular
input permutations (e.g. those which swap the 64*i-th and the (64+1)-ith
words), but those seem very unlikely to occur randomly. But if we're
worried about that, we could use your linear combination method for
the aggregation phase.

 Speaking against this option is the fact that we will need to do CPU
 detection at startup to make it fast on the x86 that support SSE4.1,
 and the fact that AMD CPUs before 2011 will run it an order of
 magnitude slower (but still faster than the best CRC).

Hm, CPU detection isn't that hard, and given the speed at which Intel
currently invents new instructions we'll end up going that route sooner
or later anyway, I think. 

 Any opinions if it would be a reasonable tradeoff to have a better
 checksum with great performance on latest x86 CPUs and good
 performance on other architectures at the expense of having only ok
 performance on older AMD CPUs?

The loss on AMD is offset by the increased performance on machines
where we can't vectorize, I'd say.

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling Checksums

2013-04-17 Thread Greg Stark

On Wed, Apr 17, 2013 at 4:28 PM, Florian Pflug f...@phlo.org wrote:
 Is there any way we can change the checksum algorithm in 9.4
 *without* breaking pg_upgrade?

Personally I think we're going to need a solution for page format
changes someday eventually

What advantages are we postponing now to avoid it?

* 32-bit checksums?
* Being able to enable/disable checksums?

Anything else?


-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Changing schema on the fly

2013-04-17 Thread Daniele Varrazzo

Hello dear -hackers,

I'm maintaining pg_reorg/pg_repack, which you may know effectively
allow online VACUUM FULL or CLUSTER. It works by installing logging
triggers to keep data up-to-date during the migration, creating a copy
of the table, and eventually swapping the tables relfilenodes.

The new table is forced to keep exactly the same physical structure,
e.g. restoring dropped columns too. Failing to do so was apparently a
big mistake, looking at this commit [1]. My knowledge of Postgres at
that level is limited: what I imagine is that cached plans keep the
offset of the field in the row, so data ends up read/written in the
wrong position if such offset changes. The commit message mentions
views and stored procedures being affected.

Is there a way to force invalidation of all the cache that may hold a
reference to the columns offset? Or is the problem an entirely
different one and the above cache invalidation wouldn't be enough?

If we managed to allow schema change in pg_repack we could allow many
more online manipulations features: changing data types, reordering
columns, really dropping columns freeing up space etc.

Thank you very much,

-- Daniele

[1] 
https://github.com/reorg/pg_repack/commit/960930b645df8eeeda15f176c95d3e450786f78a


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling Checksums

2013-04-17 Thread Bruce Momjian

On Wed, Apr 17, 2013 at 05:28:06PM +0200, Florian Pflug wrote:
 However, you're right that time's running out. It'd be a shame though
 if we'd lock ourselves into CRC as the only available algorithm essentially
 forever.  Is there any way we can change the checksum algorithm in 9.4
 *without* breaking pg_upgrade? Maybe pd_pagesize_version could be used
 for that - we could make version 5 mean just like version 4, but with
 a different checksum algorithm. Since the layout wouldn't actually
 chance, that'd be far easier to pull off than actually supporting multiple
 page layouts. If that works, then shipping 9.3 with CRC is probably
 the best solution. If not, we should see to it that something like Ants
 parallel version of FNV or a smallget into 9.3 if at all possible,
 IMHO.

I was going to ask about the flexibility of pg_upgrade and checksums. 
Right now you have to match the old and new cluster checksum modes, but
it seems it would be possible to allow pg_upgrade use from checksum to
no-checksum servers.  Does the backend look at the pg_controldata setting,
or at the page checksum flag?  If the former, it seems pg_upgrade could
run a a no-checksum server just fine that had checksum information on
its pages.  This might give us more flexibility in changing the checksum
algorithm in the future, i.e. you only lose checksum ability.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling Checksums

2013-04-17 Thread Florian Pflug

On Apr17, 2013, at 18:15 , Bruce Momjian br...@momjian.us wrote:
 On Wed, Apr 17, 2013 at 05:28:06PM +0200, Florian Pflug wrote:
 However, you're right that time's running out. It'd be a shame though
 if we'd lock ourselves into CRC as the only available algorithm essentially
 forever.  Is there any way we can change the checksum algorithm in 9.4
 *without* breaking pg_upgrade? Maybe pd_pagesize_version could be used
 for that - we could make version 5 mean just like version 4, but with
 a different checksum algorithm. Since the layout wouldn't actually
 chance, that'd be far easier to pull off than actually supporting multiple
 page layouts. If that works, then shipping 9.3 with CRC is probably
 the best solution. If not, we should see to it that something like Ants
 parallel version of FNV or a smallget into 9.3 if at all possible,
 IMHO.
 
 I was going to ask about the flexibility of pg_upgrade and checksums. 
 Right now you have to match the old and new cluster checksum modes, but
 it seems it would be possible to allow pg_upgrade use from checksum to
 no-checksum servers.  Does the backend look at the pg_controldata setting,
 or at the page checksum flag?  If the former, it seems pg_upgrade could
 run a a no-checksum server just fine that had checksum information on
 its pages.  This might give us more flexibility in changing the checksum
 algorithm in the future, i.e. you only lose checksum ability.

AFAIK, there's currently no per-page checksum flag. Still, being only
able to go from checksummed to not-checksummed probably is for all
practical purposes the same as not being able to pg_upgrade at all.
Otherwise, why would people have enabled checksums in the first place?

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling Checksums

2013-04-17 Thread Bruce Momjian

On Wed, Apr 17, 2013 at 06:33:58PM +0200, Florian Pflug wrote:
  I was going to ask about the flexibility of pg_upgrade and checksums. 
  Right now you have to match the old and new cluster checksum modes, but
  it seems it would be possible to allow pg_upgrade use from checksum to
  no-checksum servers.  Does the backend look at the pg_controldata setting,
  or at the page checksum flag?  If the former, it seems pg_upgrade could
  run a a no-checksum server just fine that had checksum information on
  its pages.  This might give us more flexibility in changing the checksum
  algorithm in the future, i.e. you only lose checksum ability.
 
 AFAIK, there's currently no per-page checksum flag. Still, being only
 able to go from checksummed to not-checksummed probably is for all
 practical purposes the same as not being able to pg_upgrade at all.
 Otherwise, why would people have enabled checksums in the first place?

Good point, but it is _an_ option, at least.

I would like to know the answer of how an upgrade from checksum to
no-checksum would behave so I can modify pg_upgrade to allow it.


-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling Checksums

2013-04-17 Thread Tom Lane

Greg Stark st...@mit.edu writes:
 On Wed, Apr 17, 2013 at 4:28 PM, Florian Pflug f...@phlo.org wrote:
 Is there any way we can change the checksum algorithm in 9.4
 *without* breaking pg_upgrade?

 Personally I think we're going to need a solution for page format
 changes someday eventually

 What advantages are we postponing now to avoid it?

Um, other than the ability to make a release?

We aren't going to hold up 9.3 until that particular bit of pie in the
sky lands.  Indeed I don't expect to see it available in the next couple
years either.  When we were looking at that seriously, two or three
years ago, arbitrary page format changes looked *hard*.

The idea of bumping the page format version number to signal a checksum
algorithm change might work though.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling Checksums

2013-04-17 Thread Bruce Momjian

On Wed, Apr 17, 2013 at 01:22:01PM -0400, Tom Lane wrote:
 Greg Stark st...@mit.edu writes:
  On Wed, Apr 17, 2013 at 4:28 PM, Florian Pflug f...@phlo.org wrote:
  Is there any way we can change the checksum algorithm in 9.4
  *without* breaking pg_upgrade?
 
  Personally I think we're going to need a solution for page format
  changes someday eventually
 
  What advantages are we postponing now to avoid it?
 
 Um, other than the ability to make a release?
 
 We aren't going to hold up 9.3 until that particular bit of pie in the
 sky lands.  Indeed I don't expect to see it available in the next couple
 years either.  When we were looking at that seriously, two or three
 years ago, arbitrary page format changes looked *hard*.
 
 The idea of bumping the page format version number to signal a checksum
 algorithm change might work though.

Uh, not sure how pg_upgrade would detect that as the version number is
not stored in pg_controldata, e.g.:

Data page checksums:  enabled/disabled

Do we need to address this for 9.3?  (Yuck)

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling Checksums

2013-04-17 Thread Tom Lane

Bruce Momjian br...@momjian.us writes:
 Uh, not sure how pg_upgrade would detect that as the version number is
 not stored in pg_controldata, e.g.:

   Data page checksums:  enabled/disabled

That seems pretty shortsighted.  The field probably ought to be defined
as containing a checksum algorithm ID number, not a boolean.

But having said that, I'm not sure why this would be pg_upgrade's
problem.  By definition, we do not want pg_upgrade running around
looking at individual data pages.  Therefore, whatever we might do
about checksum algorithm changes would have to be something that can be
managed on-the-fly by the newer server.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] erroneous restore into pg_catalog schema

2013-04-17 Thread Tom Lane

Robert Haas robertmh...@gmail.com writes:
 On Tue, Jan 29, 2013 at 6:00 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I think we need to do *something* (and accordingly have added this to
 the 9.3 open items page so we don't forget about it).  Whether Robert's
 idea is the best one probably depends in part on how clean the patch
 turns out to be.

 The attached patch attempts to implement this.  I discovered that, in
 fact, we have a number of places in our initdb-time scripts that rely
 on the current behavior, but they weren't hard to fix; and in fact I
 think the extra verbosity is probably not a bad thing here.

 See what you think.

I think this breaks contrib/adminpack, and perhaps other extensions.
They'd not be hard to fix with script changes, but they'd be broken.

In general, we would now have a situation where relocatable extensions
could never be installed into pg_catalog.  That might be OK, but at
least it would need to be documented.

Also, I think we'd be pretty much hard-wiring the decision that pg_dump
will never dump objects in pg_catalog, because its method for selecting
the creation schema won't work in that case.  That probably is all right
too, but we need to realize it's a consequence of this.

As far as the code goes, OK except I strongly disapprove of removing
the comment about temp_missing at line 3512.  The coding is not any less
a hack in that respect for having been pushed into a subroutine.  If
you want to rewrite the comment, fine, but failing to point out that
something funny is going on is not a service to readers.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling Checksums

2013-04-17 Thread Bruce Momjian

On Wed, Apr 17, 2013 at 01:29:18PM -0400, Tom Lane wrote:
 Bruce Momjian br...@momjian.us writes:
  Uh, not sure how pg_upgrade would detect that as the version number is
  not stored in pg_controldata, e.g.:
 
  Data page checksums:  enabled/disabled
 
 That seems pretty shortsighted.  The field probably ought to be defined
 as containing a checksum algorithm ID number, not a boolean.
 
 But having said that, I'm not sure why this would be pg_upgrade's
 problem.  By definition, we do not want pg_upgrade running around
 looking at individual data pages.  Therefore, whatever we might do
 about checksum algorithm changes would have to be something that can be
 managed on-the-fly by the newer server.

Well, my idea was that pg_upgrade would allow upgrades from old clusters
with the same checksum algorithm version, but not non-matching ones. 
This would allow the checksum algorithm to be changed and force
pg_upgrade to fail.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling Checksums

2013-04-17 Thread Tom Lane

Bruce Momjian br...@momjian.us writes:
 On Wed, Apr 17, 2013 at 01:29:18PM -0400, Tom Lane wrote:
 But having said that, I'm not sure why this would be pg_upgrade's
 problem.  By definition, we do not want pg_upgrade running around
 looking at individual data pages.  Therefore, whatever we might do
 about checksum algorithm changes would have to be something that can be
 managed on-the-fly by the newer server.

 Well, my idea was that pg_upgrade would allow upgrades from old clusters
 with the same checksum algorithm version, but not non-matching ones. 
 This would allow the checksum algorithm to be changed and force
 pg_upgrade to fail.

It's rather premature to be defining pg_upgrade's behavior for a
situation that doesn't exist yet, and may very well never exist
in that form.  It seems more likely to me that we'd want to allow
incremental algorithm changes, in which case pg_upgrade ought not do
anything about this case anyway.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] event trigger API documentation?

2013-04-17 Thread Dimitri Fontaine

Peter Eisentraut pete...@gmx.net writes:
 Well, if documentation had been available well before beta, other
 procedural languages might have gained support for event triggers.  If
 it's not being documented, it might not happen very soon.

It's been a moving target for the last two years, and until very
recently what to document was not clear enough to spend any time on
actually writing the docs.

Please also note that the first series of patches did include the
support code for all the core PL, but Robert didn't feel like commiting
that and no other commiter did step up.

I'm struggling to understand how to properly solve the problem here from
an organisation perspective. Before beta was not the good time for the
people involved, and was not the good time for other people to get
involved. Beta is not the good time to fix what couldn't be done before.

When are we supposed to work on the rough edges left when a patch went
through 8 commit fests and so many discussions that it's quite hard
indeed to step back and understand what's in and what's missing to make
it sensible for the release?

Maybe the right answer is to remove the documentation about event
triggers completely for 9.3 and tell the users about them later when we
have something else than just internal infrastructure.

Now, if it's ok to add support to others PL, I can cook a patch from the
bits I had done last year, the only work should be removing variables.

 It would have been good to have at least one untrusted language with
 event trigger support, so that you can hook in external auditing or
 logging systems.  With the existing PL/pgSQL support, the possible
 actions are a bit limited.

Well, you do realise that the only information you get passed down to
the event trigger code explicitely are the event name and the command
tag, and nothing else, right?

If you have a use case that requires any other information, then
documenting the event triggers will do nothing to help you implement it,
you will need to code in C and go look at the backend sources.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Fix typo in contrib/hstore/crc32.c comment

2013-04-17 Thread Fabrízio de Royes Mello

Hi all,

The attached patch fix a little typo on contrib/hstore/crc32.c comment.

Regards,

-- 
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
 Blog sobre TI: http://fabriziomello.blogspot.com
 Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
 Twitter: http://twitter.com/fabriziomello


fix_typo_hstore_crc32_comment.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling Checksums

2013-04-17 Thread Jeff Davis

On Wed, 2013-04-17 at 12:42 -0400, Bruce Momjian wrote:
  AFAIK, there's currently no per-page checksum flag. Still, being only
  able to go from checksummed to not-checksummed probably is for all
  practical purposes the same as not being able to pg_upgrade at all.
  Otherwise, why would people have enabled checksums in the first place?
 
 Good point, but it is _an_ option, at least.
 
 I would like to know the answer of how an upgrade from checksum to
 no-checksum would behave so I can modify pg_upgrade to allow it.

Why? 9.3 pg_upgrade certainly doesn't need it. When we get to 9.4, if
someone has checksums enabled and wants to disable it, why is pg_upgrade
the right time to do that? Wouldn't it make more sense to allow them to
do that at any time?

Regards,
Jeff Davis




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling Checksums

2013-04-17 Thread Jeff Davis

On Wed, 2013-04-17 at 16:58 +0100, Greg Stark wrote:
 On Wed, Apr 17, 2013 at 4:28 PM, Florian Pflug f...@phlo.org wrote:
  Is there any way we can change the checksum algorithm in 9.4
  *without* breaking pg_upgrade?
 
 Personally I think we're going to need a solution for page format
 changes someday eventually
 
 What advantages are we postponing now to avoid it?
 
 * 32-bit checksums?
 * Being able to enable/disable checksums?
 
 Anything else?

I'm not sure that changing the page format is the most difficult part of
enabling/disabling checksums. It's easy enough to have page header bits
if the current information is not enough (and those bits were there, but
Heikki requested their removal and I couldn't think of a concrete reason
to keep them).

Eventually, it would be nice to be able to break the page format and
have more space for things like checksums (and probably a few other
things, maybe some visibility-related optimizations). But that's a few
years off and we don't have any real plan for that.

What I wanted to accomplish with this patch is the simplest checksum
mechanism that we could get that would be fast enough that many people
would be able to use it. I expect it to be useful until we do decide to
break the page format.

Regards,
Jeff Davis




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] event trigger API documentation?

2013-04-17 Thread Peter Eisentraut

On 4/17/13 3:20 PM, Dimitri Fontaine wrote:
 It would have been good to have at least one untrusted language with
  event trigger support, so that you can hook in external auditing or
  logging systems.  With the existing PL/pgSQL support, the possible
  actions are a bit limited.
 Well, you do realise that the only information you get passed down to
 the event trigger code explicitely are the event name and the command
 tag, and nothing else, right?

Offhand, that seems about enough, but I'm just beginning to explore.

Chances are, event triggers will end up somewhere near the top of the
release announcements, so we should have a consistent message about what
to do with them and how to use them.  If for now, we say, we only
support writing them in PL/pgSQL, and here is how to do that, and here
are some examples, that's fine.  But currently, it's not quite clear.

Surely you had some use cases in mind when you set out to implement
this.  What were they, and where are we now in relation to them?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GENERAL] currval and DISCARD ALL

2013-04-17 Thread Marko Kreen

On Tue, Apr 16, 2013 at 05:09:19PM -0400, Tom Lane wrote:
 Bruce Momjian br...@momjian.us writes:
  I think his point is why don't we clear currval() on DISCARD ALL?  I
  can't think of a good reason we don't.
 
 Because we'd have to invent a new suboperation DISCARD SEQUENCES,
 for one thing, in order to be consistent.  I'd rather ask why it's
 important that we should throw away such state.  It doesn't seem to
 me to be important enough to justify a new subcommand.

consistency is a philosophical thing.  Practical reason for
subcommands is possibility to have partial reset for special
situations, pooling or otherwise.  But such usage seems rather
rare in real life.

If the sequences are not worth subcommand, then let's not give them
subcommand and just wait until someone comes with actual reason
to have one.

But currval() is quite noticeable thing that DISCARD ALL should clear.

 Or, if you'd rather a more direct answer: wanting this sounds like
 evidence of bad application design.  Why is your app dependent on
 getting failures from currval, and isn't there a better way to do it?

It just does not sound like, but thats exactly the request - because
DISCARD ALL leaves user-visible state around, it's hard to fix
application that depends on broken assumptions.


In fact, it was surprise to me that currval() works across transactions.
My alternative proposal would be to get rid of such silly behaviour...

-- 
marko



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling Checksums

2013-04-17 Thread Bruce Momjian

On Wed, Apr 17, 2013 at 01:59:12PM -0700, Jeff Davis wrote:
 On Wed, 2013-04-17 at 12:42 -0400, Bruce Momjian wrote:
   AFAIK, there's currently no per-page checksum flag. Still, being only
   able to go from checksummed to not-checksummed probably is for all
   practical purposes the same as not being able to pg_upgrade at all.
   Otherwise, why would people have enabled checksums in the first place?
  
  Good point, but it is _an_ option, at least.
  
  I would like to know the answer of how an upgrade from checksum to
  no-checksum would behave so I can modify pg_upgrade to allow it.
 
 Why? 9.3 pg_upgrade certainly doesn't need it. When we get to 9.4, if
 someone has checksums enabled and wants to disable it, why is pg_upgrade
 the right time to do that? Wouldn't it make more sense to allow them to
 do that at any time?

Well, right now, pg_upgrade is the only way you could potentially turn
off checksums.  You are right that we might eventually want a command,
but my point is that we currently have a limitation in pg_upgrade that
might not be necessary.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling Checksums

2013-04-17 Thread Ants Aasma

On Wed, Apr 17, 2013 at 6:54 PM, Florian Pflug f...@phlo.org wrote:
 On Apr17, 2013, at 16:47 , Ants Aasma a...@cybertec.at wrote:
 This made me remember, the issue I had was with high order bits, not
 with low order ones, somehow I got them confused. The exact issue is
 that the high order bits don't affect any bit lower than them. It's
 easy to see that if you remember the shift and add nature of multiply.
 Unfortunately XOR will not fix that. Neither will adding an offset
 basis. This is the fundamental thing that is behind the not-so-great
 uncorrelated bit error detection rate.

 Right. We could maybe fix that by extending the update step to

   tmp = s[j] ^ d[i,j]
   s[j] = (t * PRIME) ^ (t  1)

 or something like that. Shifting t instead of (t * PRIME) should
 help to reduce the performance impact, since a reordering CPU should
 be able to parallelize the multiple and the shift. Note though that
 I haven't really though that through extensively - the general idea
 should be sound, but whether 1 is a good shifting amount I do not
 know.

I was thinking about something similar too. The big issue here is that
the parallel checksums already hide each other latencies effectively
executing one each of movdqu/pmullw/paddw each cycle, that's why the
N_SUMS adds up to 128 bytes not 16 bytes.

I went ahead and coded up both the parallel FNV-1a and parallel FNV-1a
+ srl1-xor variants and ran performance tests and detection rate tests
on both.

Performance results:
Mul-add checksums: 12.9 bytes/s
FNV-1a checksums: 13.5 bytes/s
FNV-1a + srl-1: 7.4 bytes/s

Detection rates:
False positive rates:
 Add-mul   FNV-1a FNV-1a + srl-1
Single bit flip: 1:inf 1:129590   1:64795
Double bit flip: 1:148 1:511  1:53083
Triple bit flip: 1:673 1:5060 1:61511
  Quad bit flip: 1:18721:193491:68320
Write 0x00 byte: 1:774538137   1:118776   1:68952
Write 0xFF byte: 1:165399500   1:137489   1:68958
  Partial write: 1:59949   1:719391:89923
  Write garbage: 1:64866   1:649801:67732
Write run of 00: 1:57077   1:611401:59723
Write run of FF: 1:63085   1:596091:62977

Test descriptions:
N bit flip: picks N random non-overlapping bits and flips their value.
Write X byte: overwrites a single byte with X.
Partial write: picks a random cut point, overwrites everything from
there to end with 0x00.
Write garbage/run of X: picks two random cut points and fills
everything in between with random values/X bytes.

So adding in the shifted value nearly cuts the performance in half. I
think that by playing with the instruction order I might coax the CPU
scheduler to schedule the instructions better, but even in best case
it will be somewhat slower. The point to keep in mind that even this
slower speed is still faster than hardware accelerated CRC32, so all
in all the hit might not be so bad. The effect on false positive rates
for double bit errors is particularly impressive. I'm now running a
testrun that shift right by 13 to see how that works out, intuitively
it should help dispersing the bits a lot faster.

 I wonder if we use 32bit FNV-1a's (the h = (h^v)*p variant) with
 different offset-basis values, would it be enough to just XOR fold the
 resulting values together. The algorithm looking like this:

 Hm, this will make the algorithm less resilient to some particular
 input permutations (e.g. those which swap the 64*i-th and the (64+1)-ith
 words), but those seem very unlikely to occur randomly. But if we're
 worried about that, we could use your linear combination method for
 the aggregation phase.

I don't think it significantly reduces resilience to permutations
thanks to using different basis offsets and multiply not distributing
over xor.

 Speaking against this option is the fact that we will need to do CPU
 detection at startup to make it fast on the x86 that support SSE4.1,
 and the fact that AMD CPUs before 2011 will run it an order of
 magnitude slower (but still faster than the best CRC).

 Hm, CPU detection isn't that hard, and given the speed at which Intel
 currently invents new instructions we'll end up going that route sooner
 or later anyway, I think.

Sure it's not that hard but it does have an order of magnitude more
design decisions than #if defined(__x86_64__). Maybe a first stab
could avoid a generic infrastructure and just have the checksum
function as a function pointer, with the default trampoline
implementation running a cpuid and overwriting the function pointer
with either the optimized or generic versions and then calling it.

 Any opinions if it would be a reasonable tradeoff to have a better
 checksum with great performance on latest x86 CPUs and good
 performance on other architectures at the expense of having only ok
 performance on older AMD CPUs?

 The loss on AMD is offset by the increased performance on machines
 where we can't vectorize, I'd say.

+1 Old AMD machines won't soon be used by anyone caring

Re: [HACKERS] [GENERAL] currval and DISCARD ALL

2013-04-17 Thread Tom Lane

Marko Kreen mark...@gmail.com writes:
 On Tue, Apr 16, 2013 at 05:09:19PM -0400, Tom Lane wrote:
 Bruce Momjian br...@momjian.us writes:
 I think his point is why don't we clear currval() on DISCARD ALL?  I
 can't think of a good reason we don't.

 Because we'd have to invent a new suboperation DISCARD SEQUENCES,
 for one thing, in order to be consistent.  I'd rather ask why it's
 important that we should throw away such state.  It doesn't seem to
 me to be important enough to justify a new subcommand.

 consistency is a philosophical thing.

No, it's a critical tool in complexity management.  When you're dealing
with systems as complicated as a database, every little non-orthogonal
detail adds up.  DISCARD ALL has a clear definition in terms of simpler
commands, and it's going to stay that way.  Either this is worth a
subcommand, or it's not worth worrying about at all.

 But currval() is quite noticeable thing that DISCARD ALL should clear.

If it were as obvious and noticeable as all that, somebody would have
noticed before now.  We've had DISCARD ALL with its current meaning
since 8.3, and nobody complained in the five-plus years since that
shipped.

At this point, even if a concrete case were made why DISCARD ALL should
clear currval (and I repeat that no credible case has been made; nobody
has for example pointed to a reasonably-well-designed application that
this breaks), there would be a pretty strong backwards-compatibility
argument not to change it.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling Checksums

2013-04-17 Thread Tom Lane

Ants Aasma a...@cybertec.at writes:
 I was thinking about something similar too. The big issue here is that
 the parallel checksums already hide each other latencies effectively
 executing one each of movdqu/pmullw/paddw each cycle, that's why the
 N_SUMS adds up to 128 bytes not 16 bytes.

The more I read of this thread, the more unhappy I get.  It appears that
the entire design process is being driven by micro-optimization for CPUs
being built by Intel in 2013.  That ought to be, at best, a fifth-order
consideration, with full recognition that it'll be obsolete in two years,
and is already irrelevant to anyone not running one of those CPUs.

I would like to ban all discussion of assembly-language optimizations
until after 9.3 is out, so that we can concentrate on what actually
matters.  Which IMO is mostly the error detection rate and the probable
nature of false successes.  I'm glad to see that you're paying at least
some attention to that, but the priorities in this discussion are
completely backwards.

And I reiterate that there is theory out there about the error detection
capabilities of CRCs.  I'm not seeing any theory here, which leaves me
with very little confidence that we know what we're doing.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Add \ns command to psql

2013-04-17 Thread Robert Haas

On Tue, Apr 16, 2013 at 5:40 AM, Colin 't Hart co...@sharpheart.org wrote:
 Here's a new version of a small patch to psql I'm using locally.

 It adds a command \ns to psql which is a shortcut to set the
 SEARCH_PATH variable.

 I've also added tab completion making this command much more useful. I
 don't think tab completition would be possible if this command was
 defined as a variable (which was another suggestion offered at the
 time).

It's possible that the tab completion argument is a sufficient reason
for including this, but I'm kinda skeptical.  The amount of typing
saved is pretty minimal, considering that set seatab completes to
set search_path.  Assuming we had proper tab completion for set
search_path = (and off-hand, it doesn't look like that does anything
useful), this would be saving 5 keystrokes every time you want to
change the search path (set seatab is eight keystrokes, where
\nsspace is four... but it also saves you the semicolon at the end).
 I'm sure some people would find that worthwhile, but personally, I
don't.  Short commands are cryptic, and IMHO psql is already an
impenetrable thicket of difficult-to-remember abbreviations.  I've
been using it for more than 10 years now and I still have to to run \?
on a semi-regular basis.  I think that if we start adding things like
this, that help message is going to rapidly fill up with a whole lot
more abbreviations for things that are quite a bit incrementally less
useful than what's there right now.

After all, if we're going to have \ns to set the search path, why not
have something similar for work_mem, or random_page_cost?  I set both
of those variables more often than I set search_path; and there could
easily be someone else out there whose favorite GUC is client_encoding
or whatever.  And, for that matter, why stop with GUCs?  \ct for
CREATE TABLE would save lots of typing, too

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling Checksums

2013-04-17 Thread Florian Pflug

On Apr18, 2013, at 00:32 , Tom Lane t...@sss.pgh.pa.us wrote:
 Ants Aasma a...@cybertec.at writes:
 I was thinking about something similar too. The big issue here is that
 the parallel checksums already hide each other latencies effectively
 executing one each of movdqu/pmullw/paddw each cycle, that's why the
 N_SUMS adds up to 128 bytes not 16 bytes.
 
 The more I read of this thread, the more unhappy I get.  It appears that
 the entire design process is being driven by micro-optimization for CPUs
 being built by Intel in 2013.  That ought to be, at best, a fifth-order
 consideration, with full recognition that it'll be obsolete in two years,
 and is already irrelevant to anyone not running one of those CPUs.

Micro-optimization for particular CPUs yes, but general performance
considerations, no. For example, 2^n is probably one of the worst modulus
you can pick for a hash function - any prime would work much better.
But doing the computations modulo 2^16 or 2^32 carries zero performance
overhead, whereas picking another modulus requires some renormalization
after every operation. That, however, is *not* a given - it stems from
the fact nearly all CPUs in existence operated on binary integers. This
fact must thus enter into the design phase very early, and makes
2^16 or 2^32 a sensible choice for a modulus *despite* it's shortcomings,
simply because it allows for fast implementations.

 I would like to ban all discussion of assembly-language optimizations
 until after 9.3 is out, so that we can concentrate on what actually
 matters.  Which IMO is mostly the error detection rate and the probable
 nature of false successes.  I'm glad to see that you're paying at least
 some attention to that, but the priorities in this discussion are
 completely backwards.

I'd say lots of attention is paid to that, but there's *also* attention
paid to speed. Which I good, because ideally we want to end up with
a checksum with both has good error-detection properties *and* good
performance. If performance is of no concern to us, then there's little
reason not to use CRC…

 And I reiterate that there is theory out there about the error detection
 capabilities of CRCs.  I'm not seeing any theory here, which leaves me
 with very little confidence that we know what we're doing.

If you've got any pointers to literature on error-detection capabilities
of CPU-friendly checksum functions, please share. I am aware of the vast
literature on CRC, and also on some other algebraic approaches, but
none of those even come close to the speed of FNV+shift (unless there's
a special CRC instruction, that is). And there's also a ton of stuff on
cryptographic hashing, but those are optimized for a completely different
use-case...

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling Checksums

2013-04-17 Thread Florian Pflug

On Apr17, 2013, at 23:44 , Ants Aasma a...@cybertec.at wrote:
 Performance results:
 Mul-add checksums: 12.9 bytes/s
 FNV-1a checksums: 13.5 bytes/s
 FNV-1a + srl-1: 7.4 bytes/s
 
 Detection rates:
 False positive rates:
 Add-mul   FNV-1a FNV-1a + srl-1
 Single bit flip: 1:inf 1:129590   1:64795
 Double bit flip: 1:148 1:511  1:53083
 Triple bit flip: 1:673 1:5060 1:61511
  Quad bit flip: 1:18721:193491:68320
 Write 0x00 byte: 1:774538137   1:118776   1:68952
 Write 0xFF byte: 1:165399500   1:137489   1:68958
  Partial write: 1:59949   1:719391:89923
  Write garbage: 1:64866   1:649801:67732
 Write run of 00: 1:57077   1:611401:59723
 Write run of FF: 1:63085   1:596091:62977
 
 Test descriptions:
 N bit flip: picks N random non-overlapping bits and flips their value.
 Write X byte: overwrites a single byte with X.
 Partial write: picks a random cut point, overwrites everything from
 there to end with 0x00.
 Write garbage/run of X: picks two random cut points and fills
 everything in between with random values/X bytes.

Cool, thanks for testing that! The results for FNV-1a + srl-1 look
promising, I think. Its failure rate is consistently about 1:2^16,
which is the value you'd expect. That gives me some confidence that
the additional shift as working as expected.

BTW, which prime are you using for FNV-1a and FNV-1a+srl1?

 So adding in the shifted value nearly cuts the performance in half. I
 think that by playing with the instruction order I might coax the CPU
 scheduler to schedule the instructions better, but even in best case
 it will be somewhat slower. The point to keep in mind that even this
 slower speed is still faster than hardware accelerated CRC32, so all
 in all the hit might not be so bad.

Yeah. ~7 bytes/cycle still translates to over 10GB/s on typical CPU,
so that's still plenty fast I'd say...

 The effect on false positive rates
 for double bit errors is particularly impressive. I'm now running a
 testrun that shift right by 13 to see how that works out, intuitively
 it should help dispersing the bits a lot faster.

Maybe, but it also makes *only* bits 14 and 15 actually affects bits
below them, because all other's are shifted out. If you choose the
right prime it may still work, you'd have to pick one which with
enough lower bits set so that every bits affects bit 14 or 15 at some
point…

All in all a small shift seems better to me - if 1 for some reason
isn't a good choice, I'd expect 3 or so to be a suitable
replacement, but nothing much larger…

I should have some time tomorrow to spent on this, and will try 
to validate our FNV-1a modification, and see if I find a way to judge
whether 1 is a good shift.

 I wonder if we use 32bit FNV-1a's (the h = (h^v)*p variant) with
 different offset-basis values, would it be enough to just XOR fold the
 resulting values together. The algorithm looking like this:
 
 Hm, this will make the algorithm less resilient to some particular
 input permutations (e.g. those which swap the 64*i-th and the (64+1)-ith
 words), but those seem very unlikely to occur randomly. But if we're
 worried about that, we could use your linear combination method for
 the aggregation phase.
 
 I don't think it significantly reduces resilience to permutations
 thanks to using different basis offsets and multiply not distributing
 over xor.

Oh, yeah, I though you were still using 0 as base offset. If you don't,
the objection is moot.

best regards,
Florian Pflug




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling Checksums

2013-04-17 Thread Ants Aasma

On Thu, Apr 18, 2013 at 1:32 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Ants Aasma a...@cybertec.at writes:
 I was thinking about something similar too. The big issue here is that
 the parallel checksums already hide each other latencies effectively
 executing one each of movdqu/pmullw/paddw each cycle, that's why the
 N_SUMS adds up to 128 bytes not 16 bytes.

 The more I read of this thread, the more unhappy I get.  It appears that
 the entire design process is being driven by micro-optimization for CPUs
 being built by Intel in 2013.  That ought to be, at best, a fifth-order
 consideration, with full recognition that it'll be obsolete in two years,
 and is already irrelevant to anyone not running one of those CPUs.

The large scale structure takes into account the trends in computer
architecture. A lot more so than using anything straight out of the
literature. Specifically, computer architectures have hit a wall in
terms of sequential throughput, so the linear dependency chain in the
checksum algorithm will be the bottleneck soon if it isn't already.
From that it follows that a fast and future proof algorithm should not
calculate the checksum in a single log chain. The proposed algorithms
divide the input into 64x64 and 32x64 chunks. It's easy to show that
both convert the dependency chain from O(n) to O(sqrt(n)). Secondly,
unless we pick something really popular, CPUs are unlikely to provide
specifically for us, so the algorithm should be built from general
purpose computational pieces. Vector integer multiply and xor are
pretty much guaranteed to be there and fast on future CPUs. In my view
it's much more probable to be available and fast on future CPU's than
something like the Intel CRC32 acceleration.

 I would like to ban all discussion of assembly-language optimizations
 until after 9.3 is out, so that we can concentrate on what actually
 matters.  Which IMO is mostly the error detection rate and the probable
 nature of false successes.  I'm glad to see that you're paying at least
 some attention to that, but the priorities in this discussion are
 completely backwards.

I approached it from the angle that what needs to be done to get a
fundamentally fast approach have a good enough error detection rate
and not have a way of generating false positives that will give a
likely error. The algorithms are simple enough and well studied enough
that the rewards from tweaking them are negligible. I think the
resulting performance speaks for itself. Now the question is what is a
good enough algorithm. In my view, the checksum is more like a canary
in the coal mine, not something that can be relied upon, and so
ultimate efficiency is not that important if there are no obvious
horrible cases. I can see that there are other views and so am
exploring different tradeoffs between performance and quality.

 And I reiterate that there is theory out there about the error detection
 capabilities of CRCs.  I'm not seeing any theory here, which leaves me
 with very little confidence that we know what we're doing.

I haven't found much literature that is of use here. There is theory
underlying here coming from basic number theory and distilled into
rules for hash functions. For the FNV hash the prime supposedly is
carefully chosen, although all literature so far is saying it is a
good choice, but here is not the place to explain why.

Regards,
Ants Aasma
-- 
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling Checksums

2013-04-17 Thread Ants Aasma

On Thu, Apr 18, 2013 at 2:25 AM, Florian Pflug f...@phlo.org wrote:
 On Apr17, 2013, at 23:44 , Ants Aasma a...@cybertec.at wrote:
 Performance results:
 Mul-add checksums: 12.9 bytes/s
 FNV-1a checksums: 13.5 bytes/s
 FNV-1a + srl-1: 7.4 bytes/s

 Detection rates:
 False positive rates:
 Add-mul   FNV-1a FNV-1a + srl-1
 Single bit flip: 1:inf 1:129590   1:64795
 Double bit flip: 1:148 1:511  1:53083
 Triple bit flip: 1:673 1:5060 1:61511
  Quad bit flip: 1:18721:193491:68320
 Write 0x00 byte: 1:774538137   1:118776   1:68952
 Write 0xFF byte: 1:165399500   1:137489   1:68958
  Partial write: 1:59949   1:719391:89923
  Write garbage: 1:64866   1:649801:67732
 Write run of 00: 1:57077   1:611401:59723
 Write run of FF: 1:63085   1:596091:62977

 Test descriptions:
 N bit flip: picks N random non-overlapping bits and flips their value.
 Write X byte: overwrites a single byte with X.
 Partial write: picks a random cut point, overwrites everything from
 there to end with 0x00.
 Write garbage/run of X: picks two random cut points and fills
 everything in between with random values/X bytes.

 Cool, thanks for testing that! The results for FNV-1a + srl-1 look
 promising, I think. Its failure rate is consistently about 1:2^16,
 which is the value you'd expect. That gives me some confidence that
 the additional shift as working as expected.

 BTW, which prime are you using for FNV-1a and FNV-1a+srl1?

The official 32bit FNV one, 16777619.

Offsets were just random numbers. Seems good enough given the
following from the FNV page:

These non-zero integers are the FNV-0 hashes of the following 32 octets:

chongo Landon Curt Noll /\../\

 The effect on false positive rates
 for double bit errors is particularly impressive. I'm now running a
 testrun that shift right by 13 to see how that works out, intuitively
 it should help dispersing the bits a lot faster.

Empirical results are slightly better with shift of 13:

Single bit flip: 1:61615
Double bit flip: 1:58078
Triple bit flip: 1:66329
  Quad bit flip: 1:62141
Write 0x00 byte: 1:66327
Write 0xFF byte: 1:65274
  Partial write: 1:71939
  Write garbage: 1:65095
 Write run of 0: 1:62845
Write run of FF: 1:64638

 Maybe, but it also makes *only* bits 14 and 15 actually affects bits
 below them, because all other's are shifted out. If you choose the
 right prime it may still work, you'd have to pick one which with
 enough lower bits set so that every bits affects bit 14 or 15 at some
 point…

 All in all a small shift seems better to me - if 1 for some reason
 isn't a good choice, I'd expect 3 or so to be a suitable
 replacement, but nothing much larger…

I don't think the big shift is a problem, the other bits were taken
into account by the multiply, and with the larger shift the next
multiplication will disperse the changes once again. Nevertheless, I'm
running the tests with shift of 3 now.

 I should have some time tomorrow to spent on this, and will try
 to validate our FNV-1a modification, and see if I find a way to judge
 whether 1 is a good shift.

Great. I will spend some brain cycles on it too.

Regards,
Ants Aasma
-- 
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling Checksums

2013-04-17 Thread Greg Smith


On 4/17/13 6:32 PM, Tom Lane wrote:

The more I read of this thread, the more unhappy I get.  It appears that
the entire design process is being driven by micro-optimization for CPUs
being built by Intel in 2013.


And that's not going to get anyone past review, since all the tests I've 
been doing the last two weeks are on how fast an AMD Opteron 6234 with 
OS cache  shared_buffers can run this.  The main thing I'm still 
worried about is what happens when you have a fast machine that can move 
memory around very quickly and an in-memory workload, but it's hamstrung 
by the checksum computation--and it's not a 2013 Intel machine.


The question I started with here was answered to some depth and then 
skipped past.  I'd like to jerk attention back to that, since I thought 
some good answers from Ants went by.  Is there a simple way to optimize 
the committed CRC computation (or a similar one with the same error 
detection properties) based on either:


a) Knowing that the input will be a 8K page, rather than the existing 
use case with an arbitrary sized WAL section.


b) Straightforward code rearrangement or optimization flags.

That was all I thought was still feasible to consider changing for 9.3 a 
few weeks ago.  And the possible scope has only been shrinking since then.



And I reiterate that there is theory out there about the error detection
capabilities of CRCs.  I'm not seeing any theory here, which leaves me
with very little confidence that we know what we're doing.


Let me see if I can summarize where the messages flying by are at since 
you'd like to close this topic for now:


-Original checksum feature used Fletcher checksums.  Its main problems, 
to quote wikipedia, include that it cannot distinguish between blocks 
of all 0 bits and blocks of all 1 bits.


-Committed checksum feature uses truncated CRC-32.  This has known good 
error detection properties, but is expensive to compute.  There's reason 
to believe that particular computation will become cheaper on future 
platforms though.  But taking full advantage of that will require adding 
CPU-specific code to the database.


-The latest idea is using the Fowler–Noll–Vo hash function: 
https://en.wikipedia.org/wiki/Fowler_Noll_Vo_hash  There's 20 years of 
research around when that is good or bad.  The exactly properties depend 
on magic FNV primes:  http://isthe.com/chongo/tech/comp/fnv/#fnv-prime 
that can vary based on both your target block size and how many bytes 
you'll process at a time.  For PostgreSQL checksums, one of the common 
problems--getting an even distribution of the hashed values--isn't 
important the way it is for other types of hashes.  Ants and Florian 
have now dug into how exactly that and specific CPU optimization 
concerns impact the best approach for 8K database pages.  This is very 
clearly a 9.4 project that is just getting started.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling Checksums

2013-04-17 Thread Ants Aasma

On Thu, Apr 18, 2013 at 3:21 AM, Greg Smith g...@2ndquadrant.com wrote:
 On 4/17/13 6:32 PM, Tom Lane wrote:

 The more I read of this thread, the more unhappy I get.  It appears that
 the entire design process is being driven by micro-optimization for CPUs
 being built by Intel in 2013.


 And that's not going to get anyone past review, since all the tests I've
 been doing the last two weeks are on how fast an AMD Opteron 6234 with OS
 cache  shared_buffers can run this.  The main thing I'm still worried
 about is what happens when you have a fast machine that can move memory
 around very quickly and an in-memory workload, but it's hamstrung by the
 checksum computation--and it's not a 2013 Intel machine.

 The question I started with here was answered to some depth and then skipped
 past.  I'd like to jerk attention back to that, since I thought some good
 answers from Ants went by.  Is there a simple way to optimize the committed
 CRC computation (or a similar one with the same error detection properties)
 based on either:

 a) Knowing that the input will be a 8K page, rather than the existing use
 case with an arbitrary sized WAL section.

 b) Straightforward code rearrangement or optimization flags.

 That was all I thought was still feasible to consider changing for 9.3 a few
 weeks ago.  And the possible scope has only been shrinking since then.

Nothing from the two points, but the CRC calculation algorithm can be
switched out for slice-by-4 or slice-by-8 variant. Speed up was around
factor of 4 if I remember correctly.

 And I reiterate that there is theory out there about the error detection
 capabilities of CRCs.  I'm not seeing any theory here, which leaves me
 with very little confidence that we know what we're doing.


 Let me see if I can summarize where the messages flying by are at since
 you'd like to close this topic for now:

 -Original checksum feature used Fletcher checksums.  Its main problems, to
 quote wikipedia, include that it cannot distinguish between blocks of all 0
 bits and blocks of all 1 bits.

That was only the most glaring problem.

 -Committed checksum feature uses truncated CRC-32.  This has known good
 error detection properties, but is expensive to compute.  There's reason to
 believe that particular computation will become cheaper on future platforms
 though.  But taking full advantage of that will require adding CPU-specific
 code to the database.

Actually the state is that with the polynomial used there is currently
close to zero hope of CPUs optimizing for us. By switching the
polynomial we can have hardware acceleration on Intel CPUs, little
hope of others supporting given that AMD hasn't by now and Intel touts
around patents in this area. However the calculation can be made about
factor of 4 faster by restructuring the calculation. This optimization
is plain C and not CPU specific.

The committed checksum is an order of magnitude slower than the
Fletcher one that was performance tested with the patch.

 -The latest idea is using the Fowler–Noll–Vo hash function:
 https://en.wikipedia.org/wiki/Fowler_Noll_Vo_hash  There's 20 years of
 research around when that is good or bad.  The exactly properties depend on
 magic FNV primes:  http://isthe.com/chongo/tech/comp/fnv/#fnv-prime that
 can vary based on both your target block size and how many bytes you'll
 process at a time.  For PostgreSQL checksums, one of the common
 problems--getting an even distribution of the hashed values--isn't important
 the way it is for other types of hashes.  Ants and Florian have now dug into
 how exactly that and specific CPU optimization concerns impact the best
 approach for 8K database pages.  This is very clearly a 9.4 project that is
 just getting started.

I'm not sure about the 9.4 part: if we ship with the builtin CRC as
committed, there is a 100% chance that we will want to switch out the
algorithm in 9.4, and there will be quite a large subset of users that
will find the performance unusable. If we change it to whatever we
come up with here, there is a small chance that the algorithm will
give worse than expected error detection rate in some circumstances
and we will want offer a better algorithm. More probably it will be
good enough and the low performance hit will allow more users to turn
it on. This is a 16bit checksum that we talking about, not SHA-1, it
is expected to occasionally fail to detect errors. I can provide you
with a patch of the generic version of any of the discussed algorithms
within an hour, leaving plenty of time in beta or in 9.4 to
accommodate the optimized versions. It's literally a dozen self
contained lines of code.

Regards,
Ants Aasma
-- 
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling Checksums

2013-04-17 Thread Daniel Farina

On Wed, Apr 17, 2013 at 5:21 PM, Greg Smith g...@2ndquadrant.com wrote:
 Let me see if I can summarize where the messages flying by are at since
 you'd like to close this topic for now:

 -Original checksum feature used Fletcher checksums.  Its main problems, to
 quote wikipedia, include that it cannot distinguish between blocks of all 0
 bits and blocks of all 1 bits.

 -Committed checksum feature uses truncated CRC-32.  This has known good
 error detection properties, but is expensive to compute.  There's reason to
 believe that particular computation will become cheaper on future platforms
 though.  But taking full advantage of that will require adding CPU-specific
 code to the database.

 -The latest idea is using the Fowler–Noll–Vo hash function:
 https://en.wikipedia.org/wiki/Fowler_Noll_Vo_hash  There's 20 years of
 research around when that is good or bad.  The exactly properties depend on
 magic FNV primes:  http://isthe.com/chongo/tech/comp/fnv/#fnv-prime that
 can vary based on both your target block size and how many bytes you'll
 process at a time.  For PostgreSQL checksums, one of the common
 problems--getting an even distribution of the hashed values--isn't important
 the way it is for other types of hashes.  Ants and Florian have now dug into
 how exactly that and specific CPU optimization concerns impact the best
 approach for 8K database pages.  This is very clearly a 9.4 project that is
 just getting started.

I was curious about the activity in this thread and wanted to understand
the tradeoffs, and came to the same understanding as you when poking
around.  It seems the tough aspect of the equation is that the most
well studied thing is slow (CRC-32C) unless you have special ISA
support  Trying to find as much information and conclusive research on
FNV was a lot more challenging.  Fletcher is similar in that regard.

Given my hasty attempt to understand each of the alternatives, my
qualitative judgement is that, strangely enough, the most conservative
choice of the three (in terms of being understood and treated in the
literature more than ten times over) is CRC-32C, but it's also the one
being cast as only suitable inside micro-optimization.  To add
another, theoretically-oriented dimension to the discussion, I'd like
suggest it's also the most thoroughly studied of all the alternatives.
 I really had a hard time finding follow-up papers about the two
alternatives, but to be fair, I didn't try very hard...then again, I
didn't try very hard for any of the three, it's just that CRC32C was
by far the easiest find materials on.

The original paper is often shorthanded Castagnoli 93, but it exists
in the IEEE's sphere of influence and is hard to find a copy of.
Luckily, a pretty interesting survey paper discussing some of the
issues was written by Koopman in 2002 and is available:
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.11.8323 As a
pedagolgical note, it's pretty interesting and accessible piece of
writing (for me, as someone who knows little of error
detection/correction) and explains some of the engineering reasons
that provoke such exercises.

Basically...if it comes down to understand what the heck is going on
and what the trade-offs are, it was a lot easier to brush up on
CRC32-C in my meandering around the Internet.

One might think this level of scrutiny would constitute a viable
explanation of why CRC32C found its way into several standards and
then finally in silicon.

All in all, if the real world costs of CRC32C on not-SSE4.2 are
allowable, I think it's the most researched and and conservative
option, although perhaps some of the other polynomials seen in Koopman
could also be desirable.  It seems there's a tradeoff in CRC
polynomials between long-message and short-message error detection,
and the paper above may allow for a more informed selection.  CRC32C
is considered a good trade-off for both, but I haven't assessed the
paper in enough detail to suggest whether there are specialized
long-run polynomials that may be better still (although, then, there
is also the microoptimization question, which postdates the literature
I was looking at by a lot).


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] confusing message about archive failures

2013-04-17 Thread Peter Eisentraut

When archive_command fails three times, it prints this message into the
logs:

transaction log file \%s\ could not be archived: too many failures

This leaves it open what happens next.  What will actually happen is
that it will usually try again after 60 seconds or so, but the message
indicates something much more fatal than that.

Could we rephrase this a little bit to make it less dramatic, like

... too many failures, will try again later

?



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling Checksums

2013-04-17 Thread Greg Smith


On 4/17/13 8:56 PM, Ants Aasma wrote:

Nothing from the two points, but the CRC calculation algorithm can be
switched out for slice-by-4 or slice-by-8 variant. Speed up was around
factor of 4 if I remember correctly...I can provide you

 with a patch of the generic version of any of the discussed algorithms
 within an hour, leaving plenty of time in beta or in 9.4 to
 accommodate the optimized versions.

Can you nail down a solid, potential for commit slice-by-4 or slice-by-8 
patch then?  You dropped into things like per-byte overhead to reach 
this conclusion, which was fine to let the methods battle each other. 
Maybe I missed it, but I didn't remember seeing an obvious full patch 
for this implementation then come back up from that.  With the schedule 
pressure this needs to return to more database-level tests.  Your 
concerns about the committed feature being much slower then the original 
Fletcher one are troubling, and we might as well do that showdown again 
now with the best of the CRC implementations you've found.



Actually the state is that with the [CRC] polynomial used there is
currently close to zero hope of CPUs optimizing for us.


Ah, I didn't catch that before.  It sounds like the alternate slicing 
implementation should also use a different polynomial then, which sounds 
reasonable.  This doesn't even have to be exactly the same CRC function 
that the WAL uses.  A CRC that's modified for performance or having a 
better future potential is fine; there's just a lot of resistance to 
using something other than a CRC right now.



I'm not sure about the 9.4 part: if we ship with the builtin CRC as
committed, there is a 100% chance that we will want to switch out the
algorithm in 9.4, and there will be quite a large subset of users that
will find the performance unusable.


Now I have to switch out my reviewer hat for my 3 bit fortune telling 
one.  (It uses a Magic 8 Ball)  This entire approach is squeezing what 
people would prefer to be a 32 bit CRC into a spare 16 bits, as a useful 
step advancing toward a long term goal.  I have four major branches of 
possible futures here I've thought about:


1) Database checksums with 16 bits are good enough, but they have to be 
much faster to satisfy users.  It may take a different checksum 
implementation altogether to make that possible, and distinguishing 
between the two of them requires borrowing even more metadata bits from 
somewhere.  (This seems the future you're worried about)


2) Database checksums work out well, but they have to be 32 bits to 
satisfy users and/or error detection needs.  Work on pg_upgrade and 
expanding the page headers will be needed.  Optimization of the CRC now 
has a full 32 bit target.


3) The demand for database checksums is made obsolete by either 
mainstream filesystem checksumming, performance issues, or just general 
market whim.  The 16 bit checksum PostgreSQL implements becomes a 
vestigial feature, and whenever it gets in the way of making changes 
someone proposes eliminating them.  (I call this one the rules future)


4) 16 bit checksums turn out to be such a problem in the field that 
everyone regrets the whole thing, and discussions turn immediately 
toward how to eliminate that risk.


It's fair that you're very concerned about (1), but I wouldn't give it 
100% odds of happening either.  The user demand that's motivated me to 
work on this will be happy with any of (1) through (3), and in two of 
them optimizing the 16 bit checksums now turns out to be premature.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] confusing message about archive failures

2013-04-17 Thread Jeff Janes

On Wednesday, April 17, 2013, Peter Eisentraut wrote:

 When archive_command fails three times, it prints this message into the
 logs:

 transaction log file \%s\ could not be archived: too many failures

 This leaves it open what happens next.  What will actually happen is
 that it will usually try again after 60 seconds or so, but the message
 indicates something much more fatal than that.

 Could we rephrase this a little bit to make it less dramatic, like

 ... too many failures, will try again later

 ?


+1  I've found the current message alarming/confusing as well.  But I don't
really understand the logic behind bursting the attempts, 3 of them one
second apart, then sleeping 57 seconds, in the first place.

Cheers,

Jeff

[HACKERS] Word-level bigrams/trigrams in tsvector

2013-04-17 Thread Alan Li

I'm wondering how I can store word-level bigrams/trigrams in a tsvector
that I can query against. I was expecting the final query to match the
air and return the one tuple to me.

For instance:

postgres=# create table docs (a tsvector);
CREATE TABLE
postgres=# insert into docs (a) values (strip('''the air'' smells ''sea
water'''::tsvector));
INSERT 0 1
postgres=# select * from docs;
   a

 'sea water' 'smells' 'the air'
(1 row)

postgres=# select * from docs where a @@ to_tsquery('''the air''');
 a
---
(0 rows)

Thanks, Alan

50 matches

Mail list logo