[HACKERS] TODO links broken?

2013-04-16 Thread Stephen Scheck
Hello,

Many of the links in the TODO wiki page result in a "page not found" error.
Is this page up-to-date?
Can anything be inferred about the status of these items from the broken
link?

Thanks.

http://wiki.postgresql.org/wiki/Todo


Re: [HACKERS] ancient sequence point bug

2013-04-16 Thread Dann Corbit
-Original Message-
From: pgsql-hackers-ow...@postgresql.org 
[mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Tom Lane
Sent: Tuesday, April 16, 2013 7:52 PM
To: Peter Eisentraut
Cc: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] ancient sequence point bug

>Peter Eisentraut  writes:
>> This code in bootstrap.c contains a sequence point violation (or 
>> whatever that is really called):
>
>> while ((tup = heap_getnext(scan, ForwardScanDirection)) != NULL)
>> {
>> (*app)->am_oid = HeapTupleGetOid(tup);
>> memmove((char *) &(*app++)->am_typ,
>> (char *) GETSTRUCT(tup),
>> sizeof((*app)->am_typ));
>> }
>
>What exactly is the violation?  sizeof() is a side-effect-free compile time 
>constant, and the first value to be passed to memmove seems perfectly well 
>defined.

Unless it is C99 and the object is a VLA, which must be evaluated at runtime.  
I guess that (*app)->am_typ is not a VLA, especially since PostgreSQL does not 
require C99 to compile.

>I grant that this is not terribly good coding style, but I don't believe 
>there's an actual risk here.

I agree that there is no risk in the sizeof operator.
According to my understanding, even this is legal:

unsigned long long *p = 0;
size_t stupid = sizeof (*p);
   printf("size of a long long is %u\n", (unsigned) stupid);

But I also guess that most compilers will have a cow when scanning it.

>> In commit 1aebc361, another place in the same file was fixed like this:
>
>Well, I don't really remember why I did that twelve years ago, but seeing that 
>there are a number of purely-cosmetic changes in that commit, I'm thinking it 
>was only meant to be cosmetic.
>
>I certainly have no objection to making this code look more like that code, 
>I'm just not seeing that it's a bug.

You are right. It's not a bug.  However, I would not be surprised if GCC or 
CLANG would shriek at the higher warning levels, usually not being able to 
apply artificial intelligence to solve problems that seem simple to humans.
In the interest of "quiet compiles" equivalent code that does not produce a 
warning seems like a good idea.  IMO-YMMV.

>   regards, tom lane
>
>
>


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ancient sequence point bug

2013-04-16 Thread Tom Lane
Peter Eisentraut  writes:
> This code in bootstrap.c contains a sequence point violation (or
> whatever that is really called):

> while ((tup = heap_getnext(scan, ForwardScanDirection)) != NULL)
> {
> (*app)->am_oid = HeapTupleGetOid(tup);
> memmove((char *) &(*app++)->am_typ,
> (char *) GETSTRUCT(tup),
> sizeof((*app)->am_typ));
> }

What exactly is the violation?  sizeof() is a side-effect-free compile
time constant, and the first value to be passed to memmove seems
perfectly well defined.

I grant that this is not terribly good coding style, but I don't believe
there's an actual risk here.

> In commit 1aebc361, another place in the same file was fixed like this:

Well, I don't really remember why I did that twelve years ago, but
seeing that there are a number of purely-cosmetic changes in that
commit, I'm thinking it was only meant to be cosmetic.

I certainly have no objection to making this code look more like that
code, I'm just not seeing that it's a bug.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] ancient sequence point bug

2013-04-16 Thread Peter Eisentraut
This code in bootstrap.c contains a sequence point violation (or
whatever that is really called):

while ((tup = heap_getnext(scan, ForwardScanDirection)) != NULL)
{
(*app)->am_oid = HeapTupleGetOid(tup);
memmove((char *) &(*app++)->am_typ,
(char *) GETSTRUCT(tup),
sizeof((*app)->am_typ));
}

In commit 1aebc361, another place in the same file was fixed like this:

@@ -445,13 +455,13 @@ struct typmap
while (HeapTupleIsValid(tup = heap_getnext(scan, 0)))
{
(*app)->am_oid = tup->t_data->t_oid;
-   memmove((char *) &(*app++)->am_typ,
-   (char *) GETSTRUCT(tup),
-   sizeof((*app)->am_typ));
+   memcpy((char *) &(*app)->am_typ,
+  (char *) GETSTRUCT(tup),
+  sizeof((*app)->am_typ));
+   app++;
}
heap_endscan(scan);
heap_close(rel, NoLock);

I think the same (move the app++, and change to memcpy (optionally))
should be done in the first case as well.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Enabling Checksums

2013-04-16 Thread Florian Pflug
On Apr16, 2013, at 23:41 , Ants Aasma  wrote:
> On Tue, Apr 16, 2013 at 11:20 PM, Florian Pflug  wrote:
>> On Apr13, 2013, at 17:14 , Ants Aasma  wrote:
>>> Based on current analysis, it is particularly good at detecting single
>>> bit errors, as good at detecting burst errors as can be expected from
>>> 16 bits and not horrible at detecting burst writes of zeroes. It is
>>> quite bad at detecting multiple uncorrelated single bit errors and
>>> extremely bad at detecting repeating patterns of errors in low order
>>> bits.
>> 
>> I've read the patch and tried to understand why it's that bad at
>> detecting repeating patterns of errors in low order bits, and to see
>> if there might be a way to fix that without too much of a performance
>> impact.
>> 
>> Here's what I gather the algorithm does:
>> 
>>  It treats the input data, a page of L bytes, as a Nx64 matrix V
>>  of 16-bit quantities (N = L/64, obviously).
>>  It then first computes (using two primes p (PRIME1) and q (PRIME2))
>> 
>>S = V[1,1]*p^63*q^63 + V[1,2]*p^63*q^62 + … + V[1,64]*p^63*q^0
>>  + V[2,1]*p^62*q^63 + V[2,2]*p^62*q^62 + … + V[2,64]*p^62*q^0
>>  + …
>>  + V[N,1]*p^0 *q^63 + V[N,2]*p^0 *q^62 + … + V[N,64]*p^0 *q^0
>>  (mod 2^16)
>>  = sum V[i,j]*p^(64-i)*q^(64-j)
>> 
>>   Note that it does that by first computing the row-wise sums without
>>   the q^i coefficient, and then (in what the code calls the aggregation
>>   phase) combines those row-wise sums into a total, adding the q^i-
>>   coefficients along the way.
>> 
>>   The final hash value is then
>> 
>> H = S * p + B * q mod 2^16
>> 
>>   where B is a salt value intended to detect swapped pages (currently
>>   B is simply the page index)
> 
> Great job analyzing the analytic form of the algorithm and sorry I you
> had to do it instead finding it in the documentation.

No problem, glad if I can help!

>> This raises two question. First, why are there two primes? You could
>> just as well using a single prime q and set p=q^64 mod 2^16. You then
>> get
>>  S = sum V[i,j] * q^(64*(64-i) + (64-j)
>>= sum V[i,j] * q^(4096 - 64*(i-1) - j)
>> You get higher prime powers that way, but you can easily choose a prime
>> that yields distinct values mod 2^16 for exponents up to 16383. Your
>> PRIME2, for example, does. (It wraps around for 16384, i.e.
>> PRIME2^16384 = 1 mod 2^16, but that's true for every possible prime since
>> 16384 is the Carmichael function's value at 2^16)
> 
> The experimental detection rate is about the same if we use a single
> prime. But I think you have the analytical form wrong here. It should
> be given q = p:
> 
>S = sum V[i,j] * p^(64-i) * p^(64-j)
>  = sum V[i,j] * p^(64 - i + 64 - j)
>  = sum V[i,j] * p^(128 - i -j)

Yeah, if you set q = p that's true. My suggestion was p=q^64 though...

>> Second, why does it use addition instead of XOR? It seems that FNV
>> usually XORs the terms together instead of adding them?
> 
> Testing showed slightly better detection rate for adds. Intuitively I
> think it's because the carry introduces some additional mixing.

Hm, but OTOH it makes S linear in V, i.e. if you have two inputs
V1,V2 and V = V1 + V2, then S = S1 + S2. Also, if V' = V*m, then
S' = S*m. The second property is quite undesirable, I think. Assume
all the V[i,j] are divisible by 2^k, i.e. have zeros at all bit
positions 0..(k-1). Then, due to linearity, S is also divisible by
2^k, i.e. also has no ones before the k-th bit. This means, for example
that if you hash values values which all have their lowest bit cleared,
you get only 2^15 distinct hash values. If they all have the two
lowest bits cleared, you get only 2^14 distinct values, and so on…

Generally, linearity doesn't seem to be a property that one wants
in a hash I think, so my suggestion is to stick to XOR.

>> Here, btw, is a page on FNV hashing. It mentions a few rules for
>> picking suitable primes
>> 
>> http://www.isthe.com/chongo/tech/comp/fnv
> 
> Unfortunately the rules don't apply here because of the hash size.

Yeah :-(.

I noticed that their 32-bit prime only has a single one outside
the first 16 bits. Maybe we can take advantage of that and use a
32-bit state while still providing decent performance on machines
without a 32-bit x 32-bit -> 32-bit multiply instruction?

If we lived in an Intel-only world, I'd suggest going with a
32-bit state, since SSE4.1 support is *very* wide-spread already -
the last CPUs without it came out over 5 years ago, I think.
(Core2 and later support SSE4.1, and some later Core1 do too)

But unfortunately things look bleak even for other x86
implementations - AMD support SSE4.1 only starting with
Bulldozer, which came out 2011 or so I believe. Leaving the x86
realm, it seems that only ARM's NEON provides the instructions
we'd need - AltiVec seems to be support only 16-bit multiplies,
and from what some quick googling brought up, MIPS and SPARC
SIMD instructions look no better..

OTOH, chances are that nobody will ever 

[HACKERS] Change to pg_test_fsync output

2013-04-16 Thread Bruce Momjian
I propose the attached patch to change pg_test_fsync's output from
"microsecs" to "usecs", which is the designation we use in other places.
Also remove parentheses, e.g. 

 1 * 16kB open_sync write8012.933 ops/sec 125 usecs/op
 2 *  8kB open_sync writes   5399.901 ops/sec 185 usecs/op
 4 *  4kB open_sync writes   3340.958 ops/sec 299 usecs/op
 8 *  2kB open_sync writes   1903.621 ops/sec 525 usecs/op
16 *  1kB open_sync writes   1032.606 ops/sec 968 usecs/op

This is new output for 9.3.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +
diff --git a/contrib/pg_test_fsync/pg_test_fsync.c b/contrib/pg_test_fsync/pg_test_fsync.c
new file mode 100644
index 7bc0d0e..b978d9e
*** a/contrib/pg_test_fsync/pg_test_fsync.c
--- b/contrib/pg_test_fsync/pg_test_fsync.c
***
*** 25,31 
  
  #define LABEL_FORMAT		"%-32s"
  #define NA_FORMAT			"%18s"
! #define OPS_FORMAT			"%9.3f ops/sec (%6.0f microsecs/op)"
  #define USECS_SEC			100
  
  /* These are macros to avoid timing the function call overhead. */
--- 25,31 
  
  #define LABEL_FORMAT		"%-32s"
  #define NA_FORMAT			"%18s"
! #define OPS_FORMAT			"%9.3f ops/sec  %6.0f usecs/op"
  #define USECS_SEC			100
  
  /* These are macros to avoid timing the function call overhead. */

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PROPOSAL: tracking aggregated numbers from pg_stat_database

2013-04-16 Thread Greg Smith

On 4/13/13 12:44 PM, Tomas Vondra wrote:

I'm currently struggling with (quite uncommon) deployments where
databases are created/dropped regularly (not to mention tables and
indexes), and it's surprisingly difficult to process such stats to get
reasonable values.


Yes, it's a pain.  If you aggregate the table level data available now, 
you'll end up with some missing activity.  Work done between the last 
snapshot and when the drop happened is gone right now, whereas your 
aggregated stats view would preserve that activity.  The real fun is if 
the new table has the same name as the old one, which gives you all the 
negative value headaches a pg_stat_reset introduces too.


It's possible to make a case for why database level aggregate statistics 
are useful.  I don't know that yours is compelling enough to justify 
itself though, given that as you say this is an uncommon situation.  In 
something similar to your setup I've just accepted that I have to save 
the snapshots into a table, will occasionally lose some mid-snapshot 
data, and I use a combination of correction updates to that table and 
SQL window functions to provide a useful view.  It's a pain to do and I 
end up having to customize this approach for seemingly every install, 
but it can be made accurate enough for most uses.


The gigantic hole in this area I was most interested in for 9.4 
development was the lack of write statistics.  Not having 
pg_stat_database.blks_write, pg_statio_user_tables.heap_blks_write or 
pg_statio_user_indexes.idx_blks_write is a crippling operations/planning 
limitation of the database.  From that perspective, now isn't quite the 
right time to add more aggregation on top of that data, since the 
aggregation will make adding additional counters a bit more complicated. 
 It's not a big difference, but thinking in that direction doesn't help 
your suggestion.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Enabling Checksums

2013-04-16 Thread Ants Aasma
On Tue, Apr 16, 2013 at 11:20 PM, Florian Pflug  wrote:
> On Apr13, 2013, at 17:14 , Ants Aasma  wrote:
>> Based on current analysis, it is particularly good at detecting single
>> bit errors, as good at detecting burst errors as can be expected from
>> 16 bits and not horrible at detecting burst writes of zeroes. It is
>> quite bad at detecting multiple uncorrelated single bit errors and
>> extremely bad at detecting repeating patterns of errors in low order
>> bits.
>
> I've read the patch and tried to understand why it's that bad at
> detecting repeating patterns of errors in low order bits, and to see
> if there might be a way to fix that without too much of a performance
> impact.
>
> Here's what I gather the algorithm does:
>
>   It treats the input data, a page of L bytes, as a Nx64 matrix V
>   of 16-bit quantities (N = L/64, obviously).
>   It then first computes (using two primes p (PRIME1) and q (PRIME2))
>
> S = V[1,1]*p^63*q^63 + V[1,2]*p^63*q^62 + … + V[1,64]*p^63*q^0
>   + V[2,1]*p^62*q^63 + V[2,2]*p^62*q^62 + … + V[2,64]*p^62*q^0
>   + …
>   + V[N,1]*p^0 *q^63 + V[N,2]*p^0 *q^62 + … + V[N,64]*p^0 *q^0
>   (mod 2^16)
>   = sum V[i,j]*p^(64-i)*q^(64-j)
>
>Note that it does that by first computing the row-wise sums without
>the q^i coefficient, and then (in what the code calls the aggregation
>phase) combines those row-wise sums into a total, adding the q^i-
>coefficients along the way.
>
>The final hash value is then
>
>  H = S * p + B * q mod 2^16
>
>where B is a salt value intended to detect swapped pages (currently
>B is simply the page index)

Great job analyzing the analytic form of the algorithm and sorry I you
had to do it instead finding it in the documentation.

> This raises two question. First, why are there two primes? You could
> just as well using a single prime q and set p=q^64 mod 2^16. You then
> get
>   S = sum V[i,j] * q^(64*(64-i) + (64-j)
> = sum V[i,j] * q^(4096 - 64*(i-1) - j)
> You get higher prime powers that way, but you can easily choose a prime
> that yields distinct values mod 2^16 for exponents up to 16383. Your
> PRIME2, for example, does. (It wraps around for 16384, i.e.
> PRIME2^16384 = 1 mod 2^16, but that's true for every possible prime since
> 16384 is the Carmichael function's value at 2^16)

The experimental detection rate is about the same if we use a single
prime. But I think you have the analytical form wrong here. It should
be given q = p:

S = sum V[i,j] * p^(64-i) * p^(64-j)
  = sum V[i,j] * p^(64 - i + 64 - j)
  = sum V[i,j] * p^(128 - i -j)

This makes the whole matrix symmetric. While I can't think of any real
world errors that would exhibit symmetry in this 64x64 matrix, it
seemed better to me to avoid the issue altogether and use different
primes. IIRC it helped a lot for the case of page[i] = i & 0xFF.

> Second, why does it use addition instead of XOR? It seems that FNV
> usually XORs the terms together instead of adding them?

Testing showed slightly better detection rate for adds. Intuitively I
think it's because the carry introduces some additional mixing.

> Regarding the bad behaviour for multiple low-bit errors - can you
> explain why it behaves badly in that case? I currently fail to see
> why that would be. I *can* see that the lowest bit of the hash depends
> only on the lowest bit of the input words, but as long as the lowest
> bits of the input words also affect other bits of the hash, that shouldn't
> matter. Which I think they do, but I might be missing something...

Looks like you're right. I was somehow concentrating only on how the
lowest bits depend on the input.

> Here, btw, is a page on FNV hashing. It mentions a few rules for
> picking suitable primes
>
> http://www.isthe.com/chongo/tech/comp/fnv

Unfortunately the rules don't apply here because of the hash size.

Thanks for your analysis. I will do my best to get this all into a
document and will do some more analysis to see if I can come up with
some kind of proof for the error cases.

Regards,
Ants Aasma
-- 
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] currval and DISCARD ALL

2013-04-16 Thread Fabrízio de Royes Mello
On Tue, Apr 16, 2013 at 6:09 PM, Tom Lane  wrote:

>
> [...]
>
> Or, if you'd rather a more direct answer: wanting this sounds like
> evidence of bad application design.  Why is your app dependent on
> getting failures from currval, and isn't there a better way to do it?
>
>
The sequence cache (seqtab) is used per each backend, so if we use a
connection pooler (like pgbouncer in session mode) between our app and
postgres we can get a wrong value from 'currval' because the backend isn't
completely clean.

This isn't it a good reason to implement this feature?

Regards,

-- 
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Blog sobre TI: http://fabriziomello.blogspot.com
>> Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello


Re: [HACKERS] Enabling Checksums

2013-04-16 Thread Tom Lane
Ants Aasma  writes:
> On Tue, Apr 16, 2013 at 5:05 PM, Simon Riggs  wrote:
>> My only review comments are to ask for some explanation of the magic 
>> numbers...

> The specific values used are mostly magic to me too. As mentioned in a
> short sentence in the patch, the values are experimentally chosen,
> guided by some intuition about what good values should look like.

There actually is quite a lot of theory out there about this sort of
thing.  If we are inventing our own checksum function then We're Doing
It Wrong.  We should be adopting an existing, proven function.
"Experimentally derived" is about the worst recommendation I can think
of in this area.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] currval and DISCARD ALL

2013-04-16 Thread Tom Lane
Bruce Momjian  writes:
> I think his point is why don't we clear currval() on DISCARD ALL?  I
> can't think of a good reason we don't.

Because we'd have to invent a new suboperation DISCARD SEQUENCES,
for one thing, in order to be consistent.  I'd rather ask why it's
important that we should throw away such state.  It doesn't seem to
me to be important enough to justify a new subcommand.

Or, if you'd rather a more direct answer: wanting this sounds like
evidence of bad application design.  Why is your app dependent on
getting failures from currval, and isn't there a better way to do it?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Enabling Checksums

2013-04-16 Thread Ants Aasma
On Tue, Apr 16, 2013 at 5:05 PM, Simon Riggs  wrote:
> On 9 April 2013 03:35, Ants Aasma  wrote:
>> On Fri, Apr 5, 2013 at 9:39 PM, Ants Aasma  wrote:
>>> Unless somebody tells me not to waste my time I'll go ahead and come
>>> up with a workable patch by Monday.
>>
>> And here you go. I decided to be verbose with the comments as it's
>> easier to delete a comment to write one. I also left in a huge jumble
>> of macros to calculate the contents of a helper var during compile
>> time. This can easily be replaced with the calculated values once we
>> settle on specific parameters.
>>
>> Currently only x86-64 is implemented. 32bit x86 would be mostly a
>> copy-and-paste job, replacing 64bit pointer registers with 32bit ones.
>> For other platforms the simplest way would be to use a vectorizing
>> compiler on the generic variant. -funroll-loops -ftree-vectorize is
>> enough on gcc.
>>
>> Quick bench results on the worst case workload:
>> master no checksums: tps = 15.561848
>> master with checksums: tps = 1.695450
>> simd checksums: tps = 14.602698
>
> Numbers look very good on this. Well done.
>
> I support the direction of this, but I don't think I'm sufficiently
> well qualified to verify that the code does what it should and/or fix
> it if it breaks. If others want to see this happen you'll need to
> pitch in.
>
> My only review comments are to ask for some explanation of the magic 
> numbers...
> #define CSUM_PRIME1 0x49
> #define CSUM_PRIME2 0x986b
> #define CSUM_TRUNC 65521
>
> Where magic means a level of technology far above my own
> understanding, and yet no (or not enough) code comments to assist me.

The specific values used are mostly magic to me too. As mentioned in a
short sentence in the patch, the values are experimentally chosen,
guided by some intuition about what good values should look like.

Basically the methodology for the choice was that I took all the pages
from a 2.8GB test database, and then for each page introduced a bunch
of common errors and observed how many errors were undetected. The
main observations were: 1) the exact value of the primes doesn't
really matter for detection efficiency.
2) values with a non-uniform distribution of zeroes and ones seem to
work slightly better.

I'll write up a readme of why the values are how they are and with
some more explanation of the algorithm.

Regards,
Ants Aasma
-- 
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] currval and DISCARD ALL

2013-04-16 Thread Bruce Momjian
On Tue, Apr 16, 2013 at 12:13:48PM -0700, Adrian Klaver wrote:
> On 04/16/2013 08:07 AM, Nigel Heron wrote:
> >
> >On 04/15/2013 05:57 PM, Adrian Klaver wrote:
> >>On 04/15/2013 02:42 PM, Nigel Heron wrote:
> >>>Hi,
> >>>is there a way to clear the session state of sequence values fetched by
> >>>currval(regclass)? "DISCARD ALL" doesn't seem to do it.
> >>>
> >
> >>Might want to take a look at:
> >>
> >>http://www.depesz.com/2012/12/02/what-is-the-point-of-bouncing/
> >>
> >>for some hints on dealing with sequences and pgBouncer.
> >>
> >
> >thanks, I read it (his blogs are always interesting!). I'm not disputing
> >that calling currval() at the wrong time is a bad idea.
> >I'm just wondering why DISCARD ALL clears everything but this?
> 
> Well per the docs:
> 
> http://www.postgresql.org/docs/9.2/interactive/sql-discard.html
> 
> DISCARD ALL
> 
> is equivalent to:
> 
> SET SESSION AUTHORIZATION DEFAULT;
> RESET ALL;
> DEALLOCATE ALL;
> CLOSE ALL;
> UNLISTEN *;
> SELECT pg_advisory_unlock_all();
> DISCARD PLANS;
> DISCARD TEMP;
> 
> AFAIK, none of the above affect sequences.

I think his point is why don't we clear currval() on DISCARD ALL?  I
can't think of a good reason we don't.  He is saying currval() should
throw an error after DISCARD ALL:

test=> SELECT currval('seq');
ERROR:  currval of sequence "seq" is not yet defined in this session

I have moved this thead to hackers to get comments.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Enabling Checksums

2013-04-16 Thread Florian Pflug
On Apr13, 2013, at 17:14 , Ants Aasma  wrote:
> Based on current analysis, it is particularly good at detecting single
> bit errors, as good at detecting burst errors as can be expected from
> 16 bits and not horrible at detecting burst writes of zeroes. It is
> quite bad at detecting multiple uncorrelated single bit errors and
> extremely bad at detecting repeating patterns of errors in low order
> bits.

I've read the patch and tried to understand why it's that bad at
detecting repeating patterns of errors in low order bits, and to see
if there might be a way to fix that without too much of a performance
impact.

Here's what I gather the algorithm does:

  It treats the input data, a page of L bytes, as a Nx64 matrix V
  of 16-bit quantities (N = L/64, obviously).
  It then first computes (using two primes p (PRIME1) and q (PRIME2))

S = V[1,1]*p^63*q^63 + V[1,2]*p^63*q^62 + … + V[1,64]*p^63*q^0
  + V[2,1]*p^62*q^63 + V[2,2]*p^62*q^62 + … + V[2,64]*p^62*q^0
  + …
  + V[N,1]*p^0 *q^63 + V[N,2]*p^0 *q^62 + … + V[N,64]*p^0 *q^0
  (mod 2^16)
  = sum V[i,j]*p^(64-i)*q^(64-j)

   Note that it does that by first computing the row-wise sums without
   the q^i coefficient, and then (in what the code calls the aggregation
   phase) combines those row-wise sums into a total, adding the q^i-
   coefficients along the way.

   The final hash value is then

 H = S * p + B * q mod 2^16

   where B is a salt value intended to detect swapped pages (currently
   B is simply the page index)

This raises two question. First, why are there two primes? You could
just as well using a single prime q and set p=q^64 mod 2^16. You then
get
  S = sum V[i,j] * q^(64*(64-i) + (64-j)
= sum V[i,j] * q^(4096 - 64*(i-1) - j)
You get higher prime powers that way, but you can easily choose a prime
that yields distinct values mod 2^16 for exponents up to 16383. Your
PRIME2, for example, does. (It wraps around for 16384, i.e.
PRIME2^16384 = 1 mod 2^16, but that's true for every possible prime since
16384 is the Carmichael function's value at 2^16)

Second, why does it use addition instead of XOR? It seems that FNV
usually XORs the terms together instead of adding them?

Regarding the bad behaviour for multiple low-bit errors - can you
explain why it behaves badly in that case? I currently fail to see
why that would be. I *can* see that the lowest bit of the hash depends
only on the lowest bit of the input words, but as long as the lowest
bits of the input words also affect other bits of the hash, that shouldn't
matter. Which I think they do, but I might be missing something...

Here, btw, is a page on FNV hashing. It mentions a few rules for
picking suitable primes

http://www.isthe.com/chongo/tech/comp/fnv

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Enabling Checksums

2013-04-16 Thread Simon Riggs
On 16 April 2013 20:27, Tom Lane  wrote:
> Greg Stark  writes:
>> On Fri, Apr 12, 2013 at 9:42 PM, Simon Riggs  wrote:
>>> * WAL checksum is not used as the sole basis for end-of-WAL discovery.
>>> We reuse the WAL files, so the prev field in each WAL record shows
>>> what the previous end of WAL was. Hence if the WAL checksums give a
>>> false positive we still have a double check that the data really is
>>> wrong. It's unbelievable that you'd get a false positive and then have
>>> the prev field match as well, even though it was the genuine
>>> end-of-WAL.
>
>> This is kind of true and kind of not true. If a system loses power
>> while writing lots of data to WAL then the blocks at the end of the
>> WAL might not be written out in order. Everything since the last log
>> sync might be partly written out and partly not written out. That's
>> the case where the checksum is critical. The beginning of a record
>> could easily be written out including xl_prev and the end of the
>> record not written. 1/64,000 power losses would then end up with an
>> assertion failure or corrupt database.
>
> I have a hard time believing that it's a good idea to add checksums to
> data pages and at the same time weaken our ability to detect WAL
> corruption.  So this seems to me to be going in the wrong direction.
> What's it buying for us anyway?  A few CPU cycles saved during WAL
> generation?  That's probably swamped by the other costs of writing WAL,
> especially if you're using replication.

This part of the thread is dead now  I said "lets drop this idea"
on 13 April.

--
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Enabling Checksums

2013-04-16 Thread Tom Lane
Greg Stark  writes:
> On Fri, Apr 12, 2013 at 9:42 PM, Simon Riggs  wrote:
>> * WAL checksum is not used as the sole basis for end-of-WAL discovery.
>> We reuse the WAL files, so the prev field in each WAL record shows
>> what the previous end of WAL was. Hence if the WAL checksums give a
>> false positive we still have a double check that the data really is
>> wrong. It's unbelievable that you'd get a false positive and then have
>> the prev field match as well, even though it was the genuine
>> end-of-WAL.

> This is kind of true and kind of not true. If a system loses power
> while writing lots of data to WAL then the blocks at the end of the
> WAL might not be written out in order. Everything since the last log
> sync might be partly written out and partly not written out. That's
> the case where the checksum is critical. The beginning of a record
> could easily be written out including xl_prev and the end of the
> record not written. 1/64,000 power losses would then end up with an
> assertion failure or corrupt database.

I have a hard time believing that it's a good idea to add checksums to
data pages and at the same time weaken our ability to detect WAL
corruption.  So this seems to me to be going in the wrong direction.
What's it buying for us anyway?  A few CPU cycles saved during WAL
generation?  That's probably swamped by the other costs of writing WAL,
especially if you're using replication.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] event trigger API documentation?

2013-04-16 Thread Peter Eisentraut
On 4/16/13 7:49 AM, Robert Haas wrote:
> On Mon, Apr 15, 2013 at 11:57 PM, Peter Eisentraut  wrote:
>> I'm having trouble finding documentation about how to write event
>> triggers.  The chapter in the documentation
>>
>> http://www.postgresql.org/docs/devel/static/event-triggers.html
>>
>> says they can be written in C or supported PLs, but does not explain it
>> any further.  Is there any documentation for it?
> 
> That chapter has two subsections that may be useful, and there's a bit
> more here:
> 
> http://www.postgresql.org/docs/devel/static/plpgsql-trigger.html#PLPGSQL-EVENT-TRIGGER
> 
> Now that you mention it, though, it looks a little sparse.

I'm specifically looking for C API documentation, along the lines of
http://www.postgresql.org/docs/devel/static/trigger-interface.html.

The current chapter on event triggers might as well be ripped out and
folded into the CREATE EVENT TRIGGER reference page, because it explains
nothing about programming those triggers.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Enabling Checksums

2013-04-16 Thread Greg Stark
On Fri, Apr 12, 2013 at 9:42 PM, Simon Riggs  wrote:
> * WAL checksum is not used as the sole basis for end-of-WAL discovery.
> We reuse the WAL files, so the prev field in each WAL record shows
> what the previous end of WAL was. Hence if the WAL checksums give a
> false positive we still have a double check that the data really is
> wrong. It's unbelievable that you'd get a false positive and then have
> the prev field match as well, even though it was the genuine
> end-of-WAL.

This is kind of true and kind of not true. If a system loses power
while writing lots of data to WAL then the blocks at the end of the
WAL might not be written out in order. Everything since the last log
sync might be partly written out and partly not written out. That's
the case where the checksum is critical. The beginning of a record
could easily be written out including xl_prev and the end of the
record not written. 1/64,000 power losses would then end up with an
assertion failure or corrupt database.



-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] COPY and Volatile default expressions

2013-04-16 Thread Simon Riggs
On 16 April 2013 15:07, Tom Lane  wrote:
> Bruce Momjian  writes:
>> I found Simon's nextval()/COPY timings without this patch sobering.  I
>> assume he can apply this for 9.3, right?  I believe it is a fix for a
>> new 9.3 feature.
>
> It is not a "fix", it is not for a 9.3 feature (the multi-insert thing
> went in in 9.2), and personally I'd vote against applying it now,
> especially if Simon is expecting anybody to review it.

I wrote this expecting it to be a 9.4 feature. I'm good with that.

There are other somewhat related optimisations also.

--
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] COPY and Volatile default expressions

2013-04-16 Thread Bruce Momjian
On Tue, Apr 16, 2013 at 10:07:07AM -0400, Tom Lane wrote:
> Bruce Momjian  writes:
> > I found Simon's nextval()/COPY timings without this patch sobering.  I
> > assume he can apply this for 9.3, right?  I believe it is a fix for a
> > new 9.3 feature.
> 
> It is not a "fix", it is not for a 9.3 feature (the multi-insert thing
> went in in 9.2), and personally I'd vote against applying it now,
> especially if Simon is expecting anybody to review it.

Oh, I thought multi-insert was 9.3.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] COPY and Volatile default expressions

2013-04-16 Thread Tom Lane
Bruce Momjian  writes:
> I found Simon's nextval()/COPY timings without this patch sobering.  I
> assume he can apply this for 9.3, right?  I believe it is a fix for a
> new 9.3 feature.

It is not a "fix", it is not for a 9.3 feature (the multi-insert thing
went in in 9.2), and personally I'd vote against applying it now,
especially if Simon is expecting anybody to review it.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Enabling Checksums

2013-04-16 Thread Simon Riggs
On 9 April 2013 03:35, Ants Aasma  wrote:
> On Fri, Apr 5, 2013 at 9:39 PM, Ants Aasma  wrote:
>> Unless somebody tells me not to waste my time I'll go ahead and come
>> up with a workable patch by Monday.
>
> And here you go. I decided to be verbose with the comments as it's
> easier to delete a comment to write one. I also left in a huge jumble
> of macros to calculate the contents of a helper var during compile
> time. This can easily be replaced with the calculated values once we
> settle on specific parameters.
>
> Currently only x86-64 is implemented. 32bit x86 would be mostly a
> copy-and-paste job, replacing 64bit pointer registers with 32bit ones.
> For other platforms the simplest way would be to use a vectorizing
> compiler on the generic variant. -funroll-loops -ftree-vectorize is
> enough on gcc.
>
> Quick bench results on the worst case workload:
> master no checksums: tps = 15.561848
> master with checksums: tps = 1.695450
> simd checksums: tps = 14.602698

Numbers look very good on this. Well done.

I support the direction of this, but I don't think I'm sufficiently
well qualified to verify that the code does what it should and/or fix
it if it breaks. If others want to see this happen you'll need to
pitch in.

My only review comments are to ask for some explanation of the magic numbers...
#define CSUM_PRIME1 0x49
#define CSUM_PRIME2 0x986b
#define CSUM_TRUNC 65521

Where magic means a level of technology far above my own
understanding, and yet no (or not enough) code comments to assist me.

--
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] COPY and Volatile default expressions

2013-04-16 Thread Bruce Momjian
On Tue, Apr 16, 2013 at 02:37:33PM +0100, Simon Riggs wrote:
> On 16 April 2013 13:57, Heikki Linnakangas  wrote:
> 
> > You still need to check the args, if the function is nextval, otherwise you
> > incorrectly perform the optimization for something like
> > "nextval(myvolatilefunc())".
> 
> Guess so. At least its an easy change.

I found Simon's nextval()/COPY timings without this patch sobering.  I
assume he can apply this for 9.3, right?  I believe it is a fix for a
new 9.3 feature.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] COPY and Volatile default expressions

2013-04-16 Thread Simon Riggs
On 16 April 2013 13:57, Heikki Linnakangas  wrote:

> You still need to check the args, if the function is nextval, otherwise you
> incorrectly perform the optimization for something like
> "nextval(myvolatilefunc())".

Guess so. At least its an easy change.

Thanks for checking.

--
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


copy_tuning_with_default_nextval.v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] COPY and Volatile default expressions

2013-04-16 Thread Heikki Linnakangas

On 16.04.2013 14:38, Simon Riggs wrote:

On 15 April 2013 21:53, Simon Riggs  wrote:


So I'll treat this as two separate cases:
* add special case for sequences


Patch attached.

+   if (IsA(node, FuncExpr))
+   {
+   FuncExpr   *expr = (FuncExpr *) node;
+
+   /*
+* For this case only, we want to ignore the volatility of the
+* nextval() function, since some callers want this. nextval()
+* has no other args other than sequence name, so we can just
+* return false immediately in this case.
+*/
+   if (expr->funcid == F_NEXTVAL_OID)
+   return false;
+   else if (func_volatile(expr->funcid) == PROVOLATILE_VOLATILE)
+   return true;
+   /* else fall through to check args */
+   }  ...


You still need to check the args, if the function is nextval, otherwise 
you incorrectly perform the optimization for something like 
"nextval(myvolatilefunc())".


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] event trigger API documentation?

2013-04-16 Thread Robert Haas
On Mon, Apr 15, 2013 at 11:57 PM, Peter Eisentraut  wrote:
> I'm having trouble finding documentation about how to write event
> triggers.  The chapter in the documentation
>
> http://www.postgresql.org/docs/devel/static/event-triggers.html
>
> says they can be written in C or supported PLs, but does not explain it
> any further.  Is there any documentation for it?

That chapter has two subsections that may be useful, and there's a bit
more here:

http://www.postgresql.org/docs/devel/static/plpgsql-trigger.html#PLPGSQL-EVENT-TRIGGER

Now that you mention it, though, it looks a little sparse.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] COPY and Volatile default expressions

2013-04-16 Thread Simon Riggs
On 15 April 2013 21:53, Simon Riggs  wrote:

> So I'll treat this as two separate cases:
> * add special case for sequences

Patch attached.

> * use the NO SQL mechanism, as described, which implies no reads or
> writes of database state. We could test that, but its somewhat harder
> and we'd need to test for that on entry to any function, which I don't
> much like.

For another time.

--
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


copy_tuning_with_default_nextval.v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [PATCH] Add \ns command to psql

2013-04-16 Thread Colin 't Hart
Hi,

Here's a new version of a small patch to psql I'm using locally.

It adds a command \ns to psql which is a shortcut to set the
SEARCH_PATH variable.

I'd like to make a case for including this patch as it makes use of
schemas/namespaces much easier. There was resistance to including this
before just because some developers don't use schemas very much. But
we use a lot of them. And I'm sure we're not alone.

Previously I used just \n but there was some resistance to this
because the single letter commands are becoming scarce.

I've also added tab completion making this command much more useful. I
don't think tab completition would be possible if this command was
defined as a variable (which was another suggestion offered at the
time).


Cheers,

Colin


command.c.diff
Description: Binary data


tab-complete.c.diff
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] "stored procedures"

2013-04-16 Thread Pavel Stehule
Hello


2013/4/16 aasat 

> Is "stored procedures" planned in future? I think is a "most missing"
> future
> today in Postgres.
>

It is in ToDo, but nobody working on this feature in this moment, probably.


>
> Using a dblink to emulate commit in transaction is very complicated
>
>
probably autonomous transaction will be implemented first - and should be
really nice feature.

Regards

Pavel


>
>
>
> --
> View this message in context:
> http://postgresql.1045698.n5.nabble.com/stored-procedures-tp4331060p5752274.html
> Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


Re: [HACKERS] "stored procedures"

2013-04-16 Thread aasat
Is "stored procedures" planned in future? I think is a "most missing" future
today in Postgres.

Using a dblink to emulate commit in transaction is very complicated




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/stored-procedures-tp4331060p5752274.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers