Re: [HACKERS] stored procedures

2013-04-16 Thread aasat
Is stored procedures planned in future? I think is a most missing future
today in Postgres.

Using a dblink to emulate commit in transaction is very complicated




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/stored-procedures-tp4331060p5752274.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] stored procedures

2013-04-16 Thread Pavel Stehule
Hello


2013/4/16 aasat satri...@veranet.pl

 Is stored procedures planned in future? I think is a most missing
 future
 today in Postgres.


It is in ToDo, but nobody working on this feature in this moment, probably.



 Using a dblink to emulate commit in transaction is very complicated


probably autonomous transaction will be implemented first - and should be
really nice feature.

Regards

Pavel





 --
 View this message in context:
 http://postgresql.1045698.n5.nabble.com/stored-procedures-tp4331060p5752274.html
 Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers



[HACKERS] [PATCH] Add \ns command to psql

2013-04-16 Thread Colin 't Hart
Hi,

Here's a new version of a small patch to psql I'm using locally.

It adds a command \ns to psql which is a shortcut to set the
SEARCH_PATH variable.

I'd like to make a case for including this patch as it makes use of
schemas/namespaces much easier. There was resistance to including this
before just because some developers don't use schemas very much. But
we use a lot of them. And I'm sure we're not alone.

Previously I used just \n but there was some resistance to this
because the single letter commands are becoming scarce.

I've also added tab completion making this command much more useful. I
don't think tab completition would be possible if this command was
defined as a variable (which was another suggestion offered at the
time).


Cheers,

Colin


command.c.diff
Description: Binary data


tab-complete.c.diff
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] COPY and Volatile default expressions

2013-04-16 Thread Simon Riggs
On 15 April 2013 21:53, Simon Riggs si...@2ndquadrant.com wrote:

 So I'll treat this as two separate cases:
 * add special case for sequences

Patch attached.

 * use the NO SQL mechanism, as described, which implies no reads or
 writes of database state. We could test that, but its somewhat harder
 and we'd need to test for that on entry to any function, which I don't
 much like.

For another time.

--
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


copy_tuning_with_default_nextval.v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] event trigger API documentation?

2013-04-16 Thread Robert Haas
On Mon, Apr 15, 2013 at 11:57 PM, Peter Eisentraut pete...@gmx.net wrote:
 I'm having trouble finding documentation about how to write event
 triggers.  The chapter in the documentation

 http://www.postgresql.org/docs/devel/static/event-triggers.html

 says they can be written in C or supported PLs, but does not explain it
 any further.  Is there any documentation for it?

That chapter has two subsections that may be useful, and there's a bit
more here:

http://www.postgresql.org/docs/devel/static/plpgsql-trigger.html#PLPGSQL-EVENT-TRIGGER

Now that you mention it, though, it looks a little sparse.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] COPY and Volatile default expressions

2013-04-16 Thread Heikki Linnakangas

On 16.04.2013 14:38, Simon Riggs wrote:

On 15 April 2013 21:53, Simon Riggssi...@2ndquadrant.com  wrote:


So I'll treat this as two separate cases:
* add special case for sequences


Patch attached.

+   if (IsA(node, FuncExpr))
+   {
+   FuncExpr   *expr = (FuncExpr *) node;
+
+   /*
+* For this case only, we want to ignore the volatility of the
+* nextval() function, since some callers want this. nextval()
+* has no other args other than sequence name, so we can just
+* return false immediately in this case.
+*/
+   if (expr-funcid == F_NEXTVAL_OID)
+   return false;
+   else if (func_volatile(expr-funcid) == PROVOLATILE_VOLATILE)
+   return true;
+   /* else fall through to check args */
+   }  ...


You still need to check the args, if the function is nextval, otherwise 
you incorrectly perform the optimization for something like 
nextval(myvolatilefunc()).


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] COPY and Volatile default expressions

2013-04-16 Thread Simon Riggs
On 16 April 2013 13:57, Heikki Linnakangas hlinnakan...@vmware.com wrote:

 You still need to check the args, if the function is nextval, otherwise you
 incorrectly perform the optimization for something like
 nextval(myvolatilefunc()).

Guess so. At least its an easy change.

Thanks for checking.

--
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


copy_tuning_with_default_nextval.v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] COPY and Volatile default expressions

2013-04-16 Thread Bruce Momjian
On Tue, Apr 16, 2013 at 02:37:33PM +0100, Simon Riggs wrote:
 On 16 April 2013 13:57, Heikki Linnakangas hlinnakan...@vmware.com wrote:
 
  You still need to check the args, if the function is nextval, otherwise you
  incorrectly perform the optimization for something like
  nextval(myvolatilefunc()).
 
 Guess so. At least its an easy change.

I found Simon's nextval()/COPY timings without this patch sobering.  I
assume he can apply this for 9.3, right?  I believe it is a fix for a
new 9.3 feature.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Enabling Checksums

2013-04-16 Thread Simon Riggs
On 9 April 2013 03:35, Ants Aasma a...@cybertec.at wrote:
 On Fri, Apr 5, 2013 at 9:39 PM, Ants Aasma a...@cybertec.at wrote:
 Unless somebody tells me not to waste my time I'll go ahead and come
 up with a workable patch by Monday.

 And here you go. I decided to be verbose with the comments as it's
 easier to delete a comment to write one. I also left in a huge jumble
 of macros to calculate the contents of a helper var during compile
 time. This can easily be replaced with the calculated values once we
 settle on specific parameters.

 Currently only x86-64 is implemented. 32bit x86 would be mostly a
 copy-and-paste job, replacing 64bit pointer registers with 32bit ones.
 For other platforms the simplest way would be to use a vectorizing
 compiler on the generic variant. -funroll-loops -ftree-vectorize is
 enough on gcc.

 Quick bench results on the worst case workload:
 master no checksums: tps = 15.561848
 master with checksums: tps = 1.695450
 simd checksums: tps = 14.602698

Numbers look very good on this. Well done.

I support the direction of this, but I don't think I'm sufficiently
well qualified to verify that the code does what it should and/or fix
it if it breaks. If others want to see this happen you'll need to
pitch in.

My only review comments are to ask for some explanation of the magic numbers...
#define CSUM_PRIME1 0x49
#define CSUM_PRIME2 0x986b
#define CSUM_TRUNC 65521

Where magic means a level of technology far above my own
understanding, and yet no (or not enough) code comments to assist me.

--
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] COPY and Volatile default expressions

2013-04-16 Thread Tom Lane
Bruce Momjian br...@momjian.us writes:
 I found Simon's nextval()/COPY timings without this patch sobering.  I
 assume he can apply this for 9.3, right?  I believe it is a fix for a
 new 9.3 feature.

It is not a fix, it is not for a 9.3 feature (the multi-insert thing
went in in 9.2), and personally I'd vote against applying it now,
especially if Simon is expecting anybody to review it.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] COPY and Volatile default expressions

2013-04-16 Thread Bruce Momjian
On Tue, Apr 16, 2013 at 10:07:07AM -0400, Tom Lane wrote:
 Bruce Momjian br...@momjian.us writes:
  I found Simon's nextval()/COPY timings without this patch sobering.  I
  assume he can apply this for 9.3, right?  I believe it is a fix for a
  new 9.3 feature.
 
 It is not a fix, it is not for a 9.3 feature (the multi-insert thing
 went in in 9.2), and personally I'd vote against applying it now,
 especially if Simon is expecting anybody to review it.

Oh, I thought multi-insert was 9.3.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] COPY and Volatile default expressions

2013-04-16 Thread Simon Riggs
On 16 April 2013 15:07, Tom Lane t...@sss.pgh.pa.us wrote:
 Bruce Momjian br...@momjian.us writes:
 I found Simon's nextval()/COPY timings without this patch sobering.  I
 assume he can apply this for 9.3, right?  I believe it is a fix for a
 new 9.3 feature.

 It is not a fix, it is not for a 9.3 feature (the multi-insert thing
 went in in 9.2), and personally I'd vote against applying it now,
 especially if Simon is expecting anybody to review it.

I wrote this expecting it to be a 9.4 feature. I'm good with that.

There are other somewhat related optimisations also.

--
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Enabling Checksums

2013-04-16 Thread Greg Stark
On Fri, Apr 12, 2013 at 9:42 PM, Simon Riggs si...@2ndquadrant.com wrote:
 * WAL checksum is not used as the sole basis for end-of-WAL discovery.
 We reuse the WAL files, so the prev field in each WAL record shows
 what the previous end of WAL was. Hence if the WAL checksums give a
 false positive we still have a double check that the data really is
 wrong. It's unbelievable that you'd get a false positive and then have
 the prev field match as well, even though it was the genuine
 end-of-WAL.

This is kind of true and kind of not true. If a system loses power
while writing lots of data to WAL then the blocks at the end of the
WAL might not be written out in order. Everything since the last log
sync might be partly written out and partly not written out. That's
the case where the checksum is critical. The beginning of a record
could easily be written out including xl_prev and the end of the
record not written. 1/64,000 power losses would then end up with an
assertion failure or corrupt database.



-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] event trigger API documentation?

2013-04-16 Thread Peter Eisentraut
On 4/16/13 7:49 AM, Robert Haas wrote:
 On Mon, Apr 15, 2013 at 11:57 PM, Peter Eisentraut pete...@gmx.net wrote:
 I'm having trouble finding documentation about how to write event
 triggers.  The chapter in the documentation

 http://www.postgresql.org/docs/devel/static/event-triggers.html

 says they can be written in C or supported PLs, but does not explain it
 any further.  Is there any documentation for it?
 
 That chapter has two subsections that may be useful, and there's a bit
 more here:
 
 http://www.postgresql.org/docs/devel/static/plpgsql-trigger.html#PLPGSQL-EVENT-TRIGGER
 
 Now that you mention it, though, it looks a little sparse.

I'm specifically looking for C API documentation, along the lines of
http://www.postgresql.org/docs/devel/static/trigger-interface.html.

The current chapter on event triggers might as well be ripped out and
folded into the CREATE EVENT TRIGGER reference page, because it explains
nothing about programming those triggers.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Enabling Checksums

2013-04-16 Thread Tom Lane
Greg Stark st...@mit.edu writes:
 On Fri, Apr 12, 2013 at 9:42 PM, Simon Riggs si...@2ndquadrant.com wrote:
 * WAL checksum is not used as the sole basis for end-of-WAL discovery.
 We reuse the WAL files, so the prev field in each WAL record shows
 what the previous end of WAL was. Hence if the WAL checksums give a
 false positive we still have a double check that the data really is
 wrong. It's unbelievable that you'd get a false positive and then have
 the prev field match as well, even though it was the genuine
 end-of-WAL.

 This is kind of true and kind of not true. If a system loses power
 while writing lots of data to WAL then the blocks at the end of the
 WAL might not be written out in order. Everything since the last log
 sync might be partly written out and partly not written out. That's
 the case where the checksum is critical. The beginning of a record
 could easily be written out including xl_prev and the end of the
 record not written. 1/64,000 power losses would then end up with an
 assertion failure or corrupt database.

I have a hard time believing that it's a good idea to add checksums to
data pages and at the same time weaken our ability to detect WAL
corruption.  So this seems to me to be going in the wrong direction.
What's it buying for us anyway?  A few CPU cycles saved during WAL
generation?  That's probably swamped by the other costs of writing WAL,
especially if you're using replication.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Enabling Checksums

2013-04-16 Thread Simon Riggs
On 16 April 2013 20:27, Tom Lane t...@sss.pgh.pa.us wrote:
 Greg Stark st...@mit.edu writes:
 On Fri, Apr 12, 2013 at 9:42 PM, Simon Riggs si...@2ndquadrant.com wrote:
 * WAL checksum is not used as the sole basis for end-of-WAL discovery.
 We reuse the WAL files, so the prev field in each WAL record shows
 what the previous end of WAL was. Hence if the WAL checksums give a
 false positive we still have a double check that the data really is
 wrong. It's unbelievable that you'd get a false positive and then have
 the prev field match as well, even though it was the genuine
 end-of-WAL.

 This is kind of true and kind of not true. If a system loses power
 while writing lots of data to WAL then the blocks at the end of the
 WAL might not be written out in order. Everything since the last log
 sync might be partly written out and partly not written out. That's
 the case where the checksum is critical. The beginning of a record
 could easily be written out including xl_prev and the end of the
 record not written. 1/64,000 power losses would then end up with an
 assertion failure or corrupt database.

 I have a hard time believing that it's a good idea to add checksums to
 data pages and at the same time weaken our ability to detect WAL
 corruption.  So this seems to me to be going in the wrong direction.
 What's it buying for us anyway?  A few CPU cycles saved during WAL
 generation?  That's probably swamped by the other costs of writing WAL,
 especially if you're using replication.

This part of the thread is dead now  I said lets drop this idea
on 13 April.

--
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Enabling Checksums

2013-04-16 Thread Florian Pflug
On Apr13, 2013, at 17:14 , Ants Aasma a...@cybertec.at wrote:
 Based on current analysis, it is particularly good at detecting single
 bit errors, as good at detecting burst errors as can be expected from
 16 bits and not horrible at detecting burst writes of zeroes. It is
 quite bad at detecting multiple uncorrelated single bit errors and
 extremely bad at detecting repeating patterns of errors in low order
 bits.

I've read the patch and tried to understand why it's that bad at
detecting repeating patterns of errors in low order bits, and to see
if there might be a way to fix that without too much of a performance
impact.

Here's what I gather the algorithm does:

  It treats the input data, a page of L bytes, as a Nx64 matrix V
  of 16-bit quantities (N = L/64, obviously).
  It then first computes (using two primes p (PRIME1) and q (PRIME2))

S = V[1,1]*p^63*q^63 + V[1,2]*p^63*q^62 + … + V[1,64]*p^63*q^0
  + V[2,1]*p^62*q^63 + V[2,2]*p^62*q^62 + … + V[2,64]*p^62*q^0
  + …
  + V[N,1]*p^0 *q^63 + V[N,2]*p^0 *q^62 + … + V[N,64]*p^0 *q^0
  (mod 2^16)
  = sum V[i,j]*p^(64-i)*q^(64-j)

   Note that it does that by first computing the row-wise sums without
   the q^i coefficient, and then (in what the code calls the aggregation
   phase) combines those row-wise sums into a total, adding the q^i-
   coefficients along the way.

   The final hash value is then

 H = S * p + B * q mod 2^16

   where B is a salt value intended to detect swapped pages (currently
   B is simply the page index)

This raises two question. First, why are there two primes? You could
just as well using a single prime q and set p=q^64 mod 2^16. You then
get
  S = sum V[i,j] * q^(64*(64-i) + (64-j)
= sum V[i,j] * q^(4096 - 64*(i-1) - j)
You get higher prime powers that way, but you can easily choose a prime
that yields distinct values mod 2^16 for exponents up to 16383. Your
PRIME2, for example, does. (It wraps around for 16384, i.e.
PRIME2^16384 = 1 mod 2^16, but that's true for every possible prime since
16384 is the Carmichael function's value at 2^16)

Second, why does it use addition instead of XOR? It seems that FNV
usually XORs the terms together instead of adding them?

Regarding the bad behaviour for multiple low-bit errors - can you
explain why it behaves badly in that case? I currently fail to see
why that would be. I *can* see that the lowest bit of the hash depends
only on the lowest bit of the input words, but as long as the lowest
bits of the input words also affect other bits of the hash, that shouldn't
matter. Which I think they do, but I might be missing something...

Here, btw, is a page on FNV hashing. It mentions a few rules for
picking suitable primes

http://www.isthe.com/chongo/tech/comp/fnv

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] currval and DISCARD ALL

2013-04-16 Thread Bruce Momjian
On Tue, Apr 16, 2013 at 12:13:48PM -0700, Adrian Klaver wrote:
 On 04/16/2013 08:07 AM, Nigel Heron wrote:
 
 On 04/15/2013 05:57 PM, Adrian Klaver wrote:
 On 04/15/2013 02:42 PM, Nigel Heron wrote:
 Hi,
 is there a way to clear the session state of sequence values fetched by
 currval(regclass)? DISCARD ALL doesn't seem to do it.
 
 snip
 Might want to take a look at:
 
 http://www.depesz.com/2012/12/02/what-is-the-point-of-bouncing/
 
 for some hints on dealing with sequences and pgBouncer.
 
 
 thanks, I read it (his blogs are always interesting!). I'm not disputing
 that calling currval() at the wrong time is a bad idea.
 I'm just wondering why DISCARD ALL clears everything but this?
 
 Well per the docs:
 
 http://www.postgresql.org/docs/9.2/interactive/sql-discard.html
 
 DISCARD ALL
 
 is equivalent to:
 
 SET SESSION AUTHORIZATION DEFAULT;
 RESET ALL;
 DEALLOCATE ALL;
 CLOSE ALL;
 UNLISTEN *;
 SELECT pg_advisory_unlock_all();
 DISCARD PLANS;
 DISCARD TEMP;
 
 AFAIK, none of the above affect sequences.

I think his point is why don't we clear currval() on DISCARD ALL?  I
can't think of a good reason we don't.  He is saying currval() should
throw an error after DISCARD ALL:

test= SELECT currval('seq');
ERROR:  currval of sequence seq is not yet defined in this session

I have moved this thead to hackers to get comments.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Enabling Checksums

2013-04-16 Thread Ants Aasma
On Tue, Apr 16, 2013 at 5:05 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On 9 April 2013 03:35, Ants Aasma a...@cybertec.at wrote:
 On Fri, Apr 5, 2013 at 9:39 PM, Ants Aasma a...@cybertec.at wrote:
 Unless somebody tells me not to waste my time I'll go ahead and come
 up with a workable patch by Monday.

 And here you go. I decided to be verbose with the comments as it's
 easier to delete a comment to write one. I also left in a huge jumble
 of macros to calculate the contents of a helper var during compile
 time. This can easily be replaced with the calculated values once we
 settle on specific parameters.

 Currently only x86-64 is implemented. 32bit x86 would be mostly a
 copy-and-paste job, replacing 64bit pointer registers with 32bit ones.
 For other platforms the simplest way would be to use a vectorizing
 compiler on the generic variant. -funroll-loops -ftree-vectorize is
 enough on gcc.

 Quick bench results on the worst case workload:
 master no checksums: tps = 15.561848
 master with checksums: tps = 1.695450
 simd checksums: tps = 14.602698

 Numbers look very good on this. Well done.

 I support the direction of this, but I don't think I'm sufficiently
 well qualified to verify that the code does what it should and/or fix
 it if it breaks. If others want to see this happen you'll need to
 pitch in.

 My only review comments are to ask for some explanation of the magic 
 numbers...
 #define CSUM_PRIME1 0x49
 #define CSUM_PRIME2 0x986b
 #define CSUM_TRUNC 65521

 Where magic means a level of technology far above my own
 understanding, and yet no (or not enough) code comments to assist me.

The specific values used are mostly magic to me too. As mentioned in a
short sentence in the patch, the values are experimentally chosen,
guided by some intuition about what good values should look like.

Basically the methodology for the choice was that I took all the pages
from a 2.8GB test database, and then for each page introduced a bunch
of common errors and observed how many errors were undetected. The
main observations were: 1) the exact value of the primes doesn't
really matter for detection efficiency.
2) values with a non-uniform distribution of zeroes and ones seem to
work slightly better.

I'll write up a readme of why the values are how they are and with
some more explanation of the algorithm.

Regards,
Ants Aasma
-- 
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] currval and DISCARD ALL

2013-04-16 Thread Tom Lane
Bruce Momjian br...@momjian.us writes:
 I think his point is why don't we clear currval() on DISCARD ALL?  I
 can't think of a good reason we don't.

Because we'd have to invent a new suboperation DISCARD SEQUENCES,
for one thing, in order to be consistent.  I'd rather ask why it's
important that we should throw away such state.  It doesn't seem to
me to be important enough to justify a new subcommand.

Or, if you'd rather a more direct answer: wanting this sounds like
evidence of bad application design.  Why is your app dependent on
getting failures from currval, and isn't there a better way to do it?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Enabling Checksums

2013-04-16 Thread Tom Lane
Ants Aasma a...@cybertec.at writes:
 On Tue, Apr 16, 2013 at 5:05 PM, Simon Riggs si...@2ndquadrant.com wrote:
 My only review comments are to ask for some explanation of the magic 
 numbers...

 The specific values used are mostly magic to me too. As mentioned in a
 short sentence in the patch, the values are experimentally chosen,
 guided by some intuition about what good values should look like.

There actually is quite a lot of theory out there about this sort of
thing.  If we are inventing our own checksum function then We're Doing
It Wrong.  We should be adopting an existing, proven function.
Experimentally derived is about the worst recommendation I can think
of in this area.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] currval and DISCARD ALL

2013-04-16 Thread Fabrízio de Royes Mello
On Tue, Apr 16, 2013 at 6:09 PM, Tom Lane t...@sss.pgh.pa.us wrote:


 [...]

 Or, if you'd rather a more direct answer: wanting this sounds like
 evidence of bad application design.  Why is your app dependent on
 getting failures from currval, and isn't there a better way to do it?


The sequence cache (seqtab) is used per each backend, so if we use a
connection pooler (like pgbouncer in session mode) between our app and
postgres we can get a wrong value from 'currval' because the backend isn't
completely clean.

This isn't it a good reason to implement this feature?

Regards,

-- 
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
 Blog sobre TI: http://fabriziomello.blogspot.com
 Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
 Twitter: http://twitter.com/fabriziomello


Re: [HACKERS] Enabling Checksums

2013-04-16 Thread Ants Aasma
On Tue, Apr 16, 2013 at 11:20 PM, Florian Pflug f...@phlo.org wrote:
 On Apr13, 2013, at 17:14 , Ants Aasma a...@cybertec.at wrote:
 Based on current analysis, it is particularly good at detecting single
 bit errors, as good at detecting burst errors as can be expected from
 16 bits and not horrible at detecting burst writes of zeroes. It is
 quite bad at detecting multiple uncorrelated single bit errors and
 extremely bad at detecting repeating patterns of errors in low order
 bits.

 I've read the patch and tried to understand why it's that bad at
 detecting repeating patterns of errors in low order bits, and to see
 if there might be a way to fix that without too much of a performance
 impact.

 Here's what I gather the algorithm does:

   It treats the input data, a page of L bytes, as a Nx64 matrix V
   of 16-bit quantities (N = L/64, obviously).
   It then first computes (using two primes p (PRIME1) and q (PRIME2))

 S = V[1,1]*p^63*q^63 + V[1,2]*p^63*q^62 + … + V[1,64]*p^63*q^0
   + V[2,1]*p^62*q^63 + V[2,2]*p^62*q^62 + … + V[2,64]*p^62*q^0
   + …
   + V[N,1]*p^0 *q^63 + V[N,2]*p^0 *q^62 + … + V[N,64]*p^0 *q^0
   (mod 2^16)
   = sum V[i,j]*p^(64-i)*q^(64-j)

Note that it does that by first computing the row-wise sums without
the q^i coefficient, and then (in what the code calls the aggregation
phase) combines those row-wise sums into a total, adding the q^i-
coefficients along the way.

The final hash value is then

  H = S * p + B * q mod 2^16

where B is a salt value intended to detect swapped pages (currently
B is simply the page index)

Great job analyzing the analytic form of the algorithm and sorry I you
had to do it instead finding it in the documentation.

 This raises two question. First, why are there two primes? You could
 just as well using a single prime q and set p=q^64 mod 2^16. You then
 get
   S = sum V[i,j] * q^(64*(64-i) + (64-j)
 = sum V[i,j] * q^(4096 - 64*(i-1) - j)
 You get higher prime powers that way, but you can easily choose a prime
 that yields distinct values mod 2^16 for exponents up to 16383. Your
 PRIME2, for example, does. (It wraps around for 16384, i.e.
 PRIME2^16384 = 1 mod 2^16, but that's true for every possible prime since
 16384 is the Carmichael function's value at 2^16)

The experimental detection rate is about the same if we use a single
prime. But I think you have the analytical form wrong here. It should
be given q = p:

S = sum V[i,j] * p^(64-i) * p^(64-j)
  = sum V[i,j] * p^(64 - i + 64 - j)
  = sum V[i,j] * p^(128 - i -j)

This makes the whole matrix symmetric. While I can't think of any real
world errors that would exhibit symmetry in this 64x64 matrix, it
seemed better to me to avoid the issue altogether and use different
primes. IIRC it helped a lot for the case of page[i] = i  0xFF.

 Second, why does it use addition instead of XOR? It seems that FNV
 usually XORs the terms together instead of adding them?

Testing showed slightly better detection rate for adds. Intuitively I
think it's because the carry introduces some additional mixing.

 Regarding the bad behaviour for multiple low-bit errors - can you
 explain why it behaves badly in that case? I currently fail to see
 why that would be. I *can* see that the lowest bit of the hash depends
 only on the lowest bit of the input words, but as long as the lowest
 bits of the input words also affect other bits of the hash, that shouldn't
 matter. Which I think they do, but I might be missing something...

Looks like you're right. I was somehow concentrating only on how the
lowest bits depend on the input.

 Here, btw, is a page on FNV hashing. It mentions a few rules for
 picking suitable primes

 http://www.isthe.com/chongo/tech/comp/fnv

Unfortunately the rules don't apply here because of the hash size.

Thanks for your analysis. I will do my best to get this all into a
document and will do some more analysis to see if I can come up with
some kind of proof for the error cases.

Regards,
Ants Aasma
-- 
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PROPOSAL: tracking aggregated numbers from pg_stat_database

2013-04-16 Thread Greg Smith

On 4/13/13 12:44 PM, Tomas Vondra wrote:

I'm currently struggling with (quite uncommon) deployments where
databases are created/dropped regularly (not to mention tables and
indexes), and it's surprisingly difficult to process such stats to get
reasonable values.


Yes, it's a pain.  If you aggregate the table level data available now, 
you'll end up with some missing activity.  Work done between the last 
snapshot and when the drop happened is gone right now, whereas your 
aggregated stats view would preserve that activity.  The real fun is if 
the new table has the same name as the old one, which gives you all the 
negative value headaches a pg_stat_reset introduces too.


It's possible to make a case for why database level aggregate statistics 
are useful.  I don't know that yours is compelling enough to justify 
itself though, given that as you say this is an uncommon situation.  In 
something similar to your setup I've just accepted that I have to save 
the snapshots into a table, will occasionally lose some mid-snapshot 
data, and I use a combination of correction updates to that table and 
SQL window functions to provide a useful view.  It's a pain to do and I 
end up having to customize this approach for seemingly every install, 
but it can be made accurate enough for most uses.


The gigantic hole in this area I was most interested in for 9.4 
development was the lack of write statistics.  Not having 
pg_stat_database.blks_write, pg_statio_user_tables.heap_blks_write or 
pg_statio_user_indexes.idx_blks_write is a crippling operations/planning 
limitation of the database.  From that perspective, now isn't quite the 
right time to add more aggregation on top of that data, since the 
aggregation will make adding additional counters a bit more complicated. 
 It's not a big difference, but thinking in that direction doesn't help 
your suggestion.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Change to pg_test_fsync output

2013-04-16 Thread Bruce Momjian
I propose the attached patch to change pg_test_fsync's output from
microsecs to usecs, which is the designation we use in other places.
Also remove parentheses, e.g. 

 1 * 16kB open_sync write8012.933 ops/sec 125 usecs/op
 2 *  8kB open_sync writes   5399.901 ops/sec 185 usecs/op
 4 *  4kB open_sync writes   3340.958 ops/sec 299 usecs/op
 8 *  2kB open_sync writes   1903.621 ops/sec 525 usecs/op
16 *  1kB open_sync writes   1032.606 ops/sec 968 usecs/op

This is new output for 9.3.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +
diff --git a/contrib/pg_test_fsync/pg_test_fsync.c b/contrib/pg_test_fsync/pg_test_fsync.c
new file mode 100644
index 7bc0d0e..b978d9e
*** a/contrib/pg_test_fsync/pg_test_fsync.c
--- b/contrib/pg_test_fsync/pg_test_fsync.c
***
*** 25,31 
  
  #define LABEL_FORMAT		%-32s
  #define NA_FORMAT			%18s
! #define OPS_FORMAT			%9.3f ops/sec (%6.0f microsecs/op)
  #define USECS_SEC			100
  
  /* These are macros to avoid timing the function call overhead. */
--- 25,31 
  
  #define LABEL_FORMAT		%-32s
  #define NA_FORMAT			%18s
! #define OPS_FORMAT			%9.3f ops/sec  %6.0f usecs/op
  #define USECS_SEC			100
  
  /* These are macros to avoid timing the function call overhead. */

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Enabling Checksums

2013-04-16 Thread Florian Pflug
On Apr16, 2013, at 23:41 , Ants Aasma a...@cybertec.at wrote:
 On Tue, Apr 16, 2013 at 11:20 PM, Florian Pflug f...@phlo.org wrote:
 On Apr13, 2013, at 17:14 , Ants Aasma a...@cybertec.at wrote:
 Based on current analysis, it is particularly good at detecting single
 bit errors, as good at detecting burst errors as can be expected from
 16 bits and not horrible at detecting burst writes of zeroes. It is
 quite bad at detecting multiple uncorrelated single bit errors and
 extremely bad at detecting repeating patterns of errors in low order
 bits.
 
 I've read the patch and tried to understand why it's that bad at
 detecting repeating patterns of errors in low order bits, and to see
 if there might be a way to fix that without too much of a performance
 impact.
 
 Here's what I gather the algorithm does:
 
  It treats the input data, a page of L bytes, as a Nx64 matrix V
  of 16-bit quantities (N = L/64, obviously).
  It then first computes (using two primes p (PRIME1) and q (PRIME2))
 
S = V[1,1]*p^63*q^63 + V[1,2]*p^63*q^62 + … + V[1,64]*p^63*q^0
  + V[2,1]*p^62*q^63 + V[2,2]*p^62*q^62 + … + V[2,64]*p^62*q^0
  + …
  + V[N,1]*p^0 *q^63 + V[N,2]*p^0 *q^62 + … + V[N,64]*p^0 *q^0
  (mod 2^16)
  = sum V[i,j]*p^(64-i)*q^(64-j)
 
   Note that it does that by first computing the row-wise sums without
   the q^i coefficient, and then (in what the code calls the aggregation
   phase) combines those row-wise sums into a total, adding the q^i-
   coefficients along the way.
 
   The final hash value is then
 
 H = S * p + B * q mod 2^16
 
   where B is a salt value intended to detect swapped pages (currently
   B is simply the page index)
 
 Great job analyzing the analytic form of the algorithm and sorry I you
 had to do it instead finding it in the documentation.

No problem, glad if I can help!

 This raises two question. First, why are there two primes? You could
 just as well using a single prime q and set p=q^64 mod 2^16. You then
 get
  S = sum V[i,j] * q^(64*(64-i) + (64-j)
= sum V[i,j] * q^(4096 - 64*(i-1) - j)
 You get higher prime powers that way, but you can easily choose a prime
 that yields distinct values mod 2^16 for exponents up to 16383. Your
 PRIME2, for example, does. (It wraps around for 16384, i.e.
 PRIME2^16384 = 1 mod 2^16, but that's true for every possible prime since
 16384 is the Carmichael function's value at 2^16)
 
 The experimental detection rate is about the same if we use a single
 prime. But I think you have the analytical form wrong here. It should
 be given q = p:
 
S = sum V[i,j] * p^(64-i) * p^(64-j)
  = sum V[i,j] * p^(64 - i + 64 - j)
  = sum V[i,j] * p^(128 - i -j)

Yeah, if you set q = p that's true. My suggestion was p=q^64 though...

 Second, why does it use addition instead of XOR? It seems that FNV
 usually XORs the terms together instead of adding them?
 
 Testing showed slightly better detection rate for adds. Intuitively I
 think it's because the carry introduces some additional mixing.

Hm, but OTOH it makes S linear in V, i.e. if you have two inputs
V1,V2 and V = V1 + V2, then S = S1 + S2. Also, if V' = V*m, then
S' = S*m. The second property is quite undesirable, I think. Assume
all the V[i,j] are divisible by 2^k, i.e. have zeros at all bit
positions 0..(k-1). Then, due to linearity, S is also divisible by
2^k, i.e. also has no ones before the k-th bit. This means, for example
that if you hash values values which all have their lowest bit cleared,
you get only 2^15 distinct hash values. If they all have the two
lowest bits cleared, you get only 2^14 distinct values, and so on…

Generally, linearity doesn't seem to be a property that one wants
in a hash I think, so my suggestion is to stick to XOR.

 Here, btw, is a page on FNV hashing. It mentions a few rules for
 picking suitable primes
 
 http://www.isthe.com/chongo/tech/comp/fnv
 
 Unfortunately the rules don't apply here because of the hash size.

Yeah :-(.

I noticed that their 32-bit prime only has a single one outside
the first 16 bits. Maybe we can take advantage of that and use a
32-bit state while still providing decent performance on machines
without a 32-bit x 32-bit - 32-bit multiply instruction?

If we lived in an Intel-only world, I'd suggest going with a
32-bit state, since SSE4.1 support is *very* wide-spread already -
the last CPUs without it came out over 5 years ago, I think.
(Core2 and later support SSE4.1, and some later Core1 do too)

But unfortunately things look bleak even for other x86
implementations - AMD support SSE4.1 only starting with
Bulldozer, which came out 2011 or so I believe. Leaving the x86
realm, it seems that only ARM's NEON provides the instructions
we'd need - AltiVec seems to be support only 16-bit multiplies,
and from what some quick googling brought up, MIPS and SPARC
SIMD instructions look no better..

OTOH, chances are that nobody will ever do SIMD implementations
for those machines. In that case, working in 32-bit chunks 

[HACKERS] ancient sequence point bug

2013-04-16 Thread Peter Eisentraut
This code in bootstrap.c contains a sequence point violation (or
whatever that is really called):

while ((tup = heap_getnext(scan, ForwardScanDirection)) != NULL)
{
(*app)-am_oid = HeapTupleGetOid(tup);
memmove((char *) (*app++)-am_typ,
(char *) GETSTRUCT(tup),
sizeof((*app)-am_typ));
}

In commit 1aebc361, another place in the same file was fixed like this:

@@ -445,13 +455,13 @@ struct typmap
while (HeapTupleIsValid(tup = heap_getnext(scan, 0)))
{
(*app)-am_oid = tup-t_data-t_oid;
-   memmove((char *) (*app++)-am_typ,
-   (char *) GETSTRUCT(tup),
-   sizeof((*app)-am_typ));
+   memcpy((char *) (*app)-am_typ,
+  (char *) GETSTRUCT(tup),
+  sizeof((*app)-am_typ));
+   app++;
}
heap_endscan(scan);
heap_close(rel, NoLock);

I think the same (move the app++, and change to memcpy (optionally))
should be done in the first case as well.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ancient sequence point bug

2013-04-16 Thread Tom Lane
Peter Eisentraut pete...@gmx.net writes:
 This code in bootstrap.c contains a sequence point violation (or
 whatever that is really called):

 while ((tup = heap_getnext(scan, ForwardScanDirection)) != NULL)
 {
 (*app)-am_oid = HeapTupleGetOid(tup);
 memmove((char *) (*app++)-am_typ,
 (char *) GETSTRUCT(tup),
 sizeof((*app)-am_typ));
 }

What exactly is the violation?  sizeof() is a side-effect-free compile
time constant, and the first value to be passed to memmove seems
perfectly well defined.

I grant that this is not terribly good coding style, but I don't believe
there's an actual risk here.

 In commit 1aebc361, another place in the same file was fixed like this:

Well, I don't really remember why I did that twelve years ago, but
seeing that there are a number of purely-cosmetic changes in that
commit, I'm thinking it was only meant to be cosmetic.

I certainly have no objection to making this code look more like that
code, I'm just not seeing that it's a bug.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ancient sequence point bug

2013-04-16 Thread Dann Corbit
-Original Message-
From: pgsql-hackers-ow...@postgresql.org 
[mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Tom Lane
Sent: Tuesday, April 16, 2013 7:52 PM
To: Peter Eisentraut
Cc: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] ancient sequence point bug

Peter Eisentraut pete...@gmx.net writes:
 This code in bootstrap.c contains a sequence point violation (or 
 whatever that is really called):

 while ((tup = heap_getnext(scan, ForwardScanDirection)) != NULL)
 {
 (*app)-am_oid = HeapTupleGetOid(tup);
 memmove((char *) (*app++)-am_typ,
 (char *) GETSTRUCT(tup),
 sizeof((*app)-am_typ));
 }

What exactly is the violation?  sizeof() is a side-effect-free compile time 
constant, and the first value to be passed to memmove seems perfectly well 
defined.

Unless it is C99 and the object is a VLA, which must be evaluated at runtime.  
I guess that (*app)-am_typ is not a VLA, especially since PostgreSQL does not 
require C99 to compile.

I grant that this is not terribly good coding style, but I don't believe 
there's an actual risk here.

I agree that there is no risk in the sizeof operator.
According to my understanding, even this is legal:

unsigned long long *p = 0;
size_t stupid = sizeof (*p);
   printf(size of a long long is %u\n, (unsigned) stupid);

But I also guess that most compilers will have a cow when scanning it.

 In commit 1aebc361, another place in the same file was fixed like this:

Well, I don't really remember why I did that twelve years ago, but seeing that 
there are a number of purely-cosmetic changes in that commit, I'm thinking it 
was only meant to be cosmetic.

I certainly have no objection to making this code look more like that code, 
I'm just not seeing that it's a bug.

You are right. It's not a bug.  However, I would not be surprised if GCC or 
CLANG would shriek at the higher warning levels, usually not being able to 
apply artificial intelligence to solve problems that seem simple to humans.
In the interest of quiet compiles equivalent code that does not produce a 
warning seems like a good idea.  IMO-YMMV.

   regards, tom lane





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] TODO links broken?

2013-04-16 Thread Stephen Scheck
Hello,

Many of the links in the TODO wiki page result in a page not found error.
Is this page up-to-date?
Can anything be inferred about the status of these items from the broken
link?

Thanks.

http://wiki.postgresql.org/wiki/Todo