date:20130417

Re: [HACKERS] Enabling Checksums

2013-04-17 Thread Andres Freund

On 2013-04-18 00:44:02 +0300, Ants Aasma wrote:
> I went ahead and coded up both the parallel FNV-1a and parallel FNV-1a
> + srl1-xor variants and ran performance tests and detection rate tests
> on both.
> 
> Performance results:
> Mul-add checksums: 12.9 bytes/s
> FNV-1a checksums: 13.5 bytes/s
> FNV-1a + srl-1: 7.4 bytes/s
> 
> Detection rates:
> False positive rates:
>  Add-mul   FNV-1a FNV-1a + srl-1
> Single bit flip: 1:inf 1:129590   1:64795
> Double bit flip: 1:148 1:511  1:53083
> Triple bit flip: 1:673 1:5060 1:61511
>   Quad bit flip: 1:18721:193491:68320
> Write 0x00 byte: 1:774538137   1:118776   1:68952
> Write 0xFF byte: 1:165399500   1:137489   1:68958
>   Partial write: 1:59949   1:719391:89923
>   Write garbage: 1:64866   1:649801:67732
> Write run of 00: 1:57077   1:611401:59723
> Write run of FF: 1:63085   1:596091:62977
> 
> Test descriptions:
> N bit flip: picks N random non-overlapping bits and flips their value.
> Write X byte: overwrites a single byte with X.
> Partial write: picks a random cut point, overwrites everything from
> there to end with 0x00.
> Write garbage/run of X: picks two random cut points and fills
> everything in between with random values/X bytes.

I don't think this table is complete without competing numbers for
truncated crc-32. Any chance to get that?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling Checksums

2013-04-17 Thread Andres Freund

On 2013-04-17 18:16:36 -0700, Daniel Farina wrote:
> The original paper is often shorthanded "Castagnoli 93", but it exists
> in the IEEE's sphere of influence and is hard to find a copy of.
> Luckily, a pretty interesting survey paper discussing some of the
> issues was written by Koopman in 2002 and is available:
> http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.11.8323 As a
> pedagolgical note, it's pretty interesting and accessible piece of
> writing (for me, as someone who knows little of error
> detection/correction) and explains some of the engineering reasons
> that provoke such exercises.

http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=231911&userType=inst

There's also a koopman paper from 2004 thats interesting.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Word-level bigrams/trigrams in tsvector

2013-04-17 Thread Alan Li

I'm wondering how I can store word-level bigrams/trigrams in a tsvector
that I can query against. I was expecting the final query to match "the
air" and return the one tuple to me.

For instance:

postgres=# create table docs (a tsvector);
CREATE TABLE
postgres=# insert into docs (a) values (strip('''the air'' smells ''sea
water'''::tsvector));
INSERT 0 1
postgres=# select * from docs;
   a

 'sea water' 'smells' 'the air'
(1 row)

postgres=# select * from docs where a @@ to_tsquery('''the air''');
 a
---
(0 rows)

Thanks, Alan

Re: [HACKERS] confusing message about archive failures

2013-04-17 Thread Jeff Janes

On Wednesday, April 17, 2013, Peter Eisentraut wrote:

> When archive_command fails three times, it prints this message into the
> logs:
>
> "transaction log file \"%s\" could not be archived: too many failures"
>
> This leaves it open what happens next.  What will actually happen is
> that it will usually try again after 60 seconds or so, but the message
> indicates something much more fatal than that.
>
> Could we rephrase this a little bit to make it less dramatic, like
>
> "... too many failures, will try again later"
>
> ?
>

+1  I've found the current message alarming/confusing as well.  But I don't
really understand the logic behind bursting the attempts, 3 of them one
second apart, then sleeping 57 seconds, in the first place.

Cheers,

Jeff

Re: [HACKERS] Enabling Checksums

2013-04-17 Thread Greg Smith

On 4/17/13 8:56 PM, Ants Aasma wrote:

Nothing from the two points, but the CRC calculation algorithm can be
switched out for slice-by-4 or slice-by-8 variant. Speed up was around
factor of 4 if I remember correctly...I can provide you

> with a patch of the generic version of any of the discussed algorithms
> within an hour, leaving plenty of time in beta or in 9.4 to
> accommodate the optimized versions.

Can you nail down a solid, potential for commit slice-by-4 or slice-by-8 
patch then?  You dropped into things like per-byte overhead to reach 
this conclusion, which was fine to let the methods battle each other. 
Maybe I missed it, but I didn't remember seeing an obvious full patch 
for this implementation then come back up from that.  With the schedule 
pressure this needs to return to more database-level tests.  Your 
concerns about the committed feature being much slower then the original 
Fletcher one are troubling, and we might as well do that showdown again 
now with the best of the CRC implementations you've found.

Actually the state is that with the [CRC] polynomial used there is
currently close to zero hope of CPUs optimizing for us.

Ah, I didn't catch that before.  It sounds like the alternate slicing 
implementation should also use a different polynomial then, which sounds 
reasonable.  This doesn't even have to be exactly the same CRC function 
that the WAL uses.  A CRC that's modified for performance or having a 
better future potential is fine; there's just a lot of resistance to 
using something other than a CRC right now.

I'm not sure about the 9.4 part: if we ship with the builtin CRC as
committed, there is a 100% chance that we will want to switch out the
algorithm in 9.4, and there will be quite a large subset of users that
will find the performance unusable.

Now I have to switch out my reviewer hat for my 3 bit fortune telling 
one.  (It uses a Magic 8 Ball)  This entire approach is squeezing what 
people would prefer to be a 32 bit CRC into a spare 16 bits, as a useful 
step advancing toward a long term goal.  I have four major branches of 
possible futures here I've thought about:

1) Database checksums with 16 bits are good enough, but they have to be 
much faster to satisfy users.  It may take a different checksum 
implementation altogether to make that possible, and distinguishing 
between the two of them requires borrowing even more metadata bits from 
somewhere.  (This seems the future you're worried about)

2) Database checksums work out well, but they have to be 32 bits to 
satisfy users and/or error detection needs.  Work on pg_upgrade and 
expanding the page headers will be needed.  Optimization of the CRC now 
has a full 32 bit target.

3) The demand for database checksums is made obsolete by either 
mainstream filesystem checksumming, performance issues, or just general 
market whim.  The 16 bit checksum PostgreSQL implements becomes a 
vestigial feature, and whenever it gets in the way of making changes 
someone proposes eliminating them.  (I call this one the "rules" future)

4) 16 bit checksums turn out to be such a problem in the field that 
everyone regrets the whole thing, and discussions turn immediately 
toward how to eliminate that risk.

It's fair that you're very concerned about (1), but I wouldn't give it 
100% odds of happening either.  The user demand that's motivated me to 
work on this will be happy with any of (1) through (3), and in two of 
them optimizing the 16 bit checksums now turns out to be premature.

--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] confusing message about archive failures

2013-04-17 Thread Peter Eisentraut

When archive_command fails three times, it prints this message into the
logs:

"transaction log file \"%s\" could not be archived: too many failures"

This leaves it open what happens next.  What will actually happen is
that it will usually try again after 60 seconds or so, but the message
indicates something much more fatal than that.

Could we rephrase this a little bit to make it less dramatic, like

"... too many failures, will try again later"

?



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling Checksums

2013-04-17 Thread Daniel Farina

On Wed, Apr 17, 2013 at 5:21 PM, Greg Smith  wrote:
> Let me see if I can summarize where the messages flying by are at since
> you'd like to close this topic for now:
>
> -Original checksum feature used Fletcher checksums.  Its main problems, to
> quote wikipedia, include that it "cannot distinguish between blocks of all 0
> bits and blocks of all 1 bits".
>
> -Committed checksum feature uses truncated CRC-32.  This has known good
> error detection properties, but is expensive to compute.  There's reason to
> believe that particular computation will become cheaper on future platforms
> though.  But taking full advantage of that will require adding CPU-specific
> code to the database.
>
> -The latest idea is using the Fowler–Noll–Vo hash function:
> https://en.wikipedia.org/wiki/Fowler_Noll_Vo_hash  There's 20 years of
> research around when that is good or bad.  The exactly properties depend on
> magic "FNV primes":  http://isthe.com/chongo/tech/comp/fnv/#fnv-prime that
> can vary based on both your target block size and how many bytes you'll
> process at a time.  For PostgreSQL checksums, one of the common
> problems--getting an even distribution of the hashed values--isn't important
> the way it is for other types of hashes.  Ants and Florian have now dug into
> how exactly that and specific CPU optimization concerns impact the best
> approach for 8K database pages.  This is very clearly a 9.4 project that is
> just getting started.

I was curious about the activity in this thread and wanted to understand
the tradeoffs, and came to the same understanding as you when poking
around.  It seems the tough aspect of the equation is that the most
well studied thing is slow (CRC-32C) unless you have special ISA
support  Trying to find as much information and conclusive research on
FNV was a lot more challenging.  Fletcher is similar in that regard.

Given my hasty attempt to understand each of the alternatives, my
qualitative judgement is that, strangely enough, the most conservative
choice of the three (in terms of being understood and treated in the
literature more than ten times over) is CRC-32C, but it's also the one
being cast as only suitable inside micro-optimization.  To add
another, theoretically-oriented dimension to the discussion, I'd like
suggest it's also the most thoroughly studied of all the alternatives.
 I really had a hard time finding follow-up papers about the two
alternatives, but to be fair, I didn't try very hard...then again, I
didn't try very hard for any of the three, it's just that CRC32C was
by far the easiest find materials on.

The original paper is often shorthanded "Castagnoli 93", but it exists
in the IEEE's sphere of influence and is hard to find a copy of.
Luckily, a pretty interesting survey paper discussing some of the
issues was written by Koopman in 2002 and is available:
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.11.8323 As a
pedagolgical note, it's pretty interesting and accessible piece of
writing (for me, as someone who knows little of error
detection/correction) and explains some of the engineering reasons
that provoke such exercises.

Basically...if it comes down to understand what the heck is going on
and what the trade-offs are, it was a lot easier to brush up on
CRC32-C in my meandering around the Internet.

One might think this level of scrutiny would constitute a viable
explanation of why CRC32C found its way into several standards and
then finally in silicon.

All in all, if the real world costs of CRC32C on not-SSE4.2 are
allowable, I think it's the most researched and and conservative
option, although perhaps some of the other polynomials seen in Koopman
could also be desirable.  It seems there's a tradeoff in CRC
polynomials between long-message and short-message error detection,
and the paper above may allow for a more informed selection.  CRC32C
is considered a good trade-off for both, but I haven't assessed the
paper in enough detail to suggest whether there are specialized
long-run polynomials that may be better still (although, then, there
is also the microoptimization question, which postdates the literature
I was looking at by a lot).

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling Checksums

2013-04-17 Thread Ants Aasma

On Thu, Apr 18, 2013 at 3:21 AM, Greg Smith  wrote:
> On 4/17/13 6:32 PM, Tom Lane wrote:
>>
>> The more I read of this thread, the more unhappy I get.  It appears that
>> the entire design process is being driven by micro-optimization for CPUs
>> being built by Intel in 2013.
>
>
> And that's not going to get anyone past review, since all the tests I've
> been doing the last two weeks are on how fast an AMD Opteron 6234 with OS
> cache >> shared_buffers can run this.  The main thing I'm still worried
> about is what happens when you have a fast machine that can move memory
> around very quickly and an in-memory workload, but it's hamstrung by the
> checksum computation--and it's not a 2013 Intel machine.
>
> The question I started with here was answered to some depth and then skipped
> past.  I'd like to jerk attention back to that, since I thought some good
> answers from Ants went by.  Is there a simple way to optimize the committed
> CRC computation (or a similar one with the same error detection properties)
> based on either:
>
> a) Knowing that the input will be a 8K page, rather than the existing use
> case with an arbitrary sized WAL section.
>
> b) Straightforward code rearrangement or optimization flags.
>
> That was all I thought was still feasible to consider changing for 9.3 a few
> weeks ago.  And the possible scope has only been shrinking since then.

Nothing from the two points, but the CRC calculation algorithm can be
switched out for slice-by-4 or slice-by-8 variant. Speed up was around
factor of 4 if I remember correctly.

>> And I reiterate that there is theory out there about the error detection
>> capabilities of CRCs.  I'm not seeing any theory here, which leaves me
>> with very little confidence that we know what we're doing.
>
>
> Let me see if I can summarize where the messages flying by are at since
> you'd like to close this topic for now:
>
> -Original checksum feature used Fletcher checksums.  Its main problems, to
> quote wikipedia, include that it "cannot distinguish between blocks of all 0
> bits and blocks of all 1 bits".

That was only the most glaring problem.

> -Committed checksum feature uses truncated CRC-32.  This has known good
> error detection properties, but is expensive to compute.  There's reason to
> believe that particular computation will become cheaper on future platforms
> though.  But taking full advantage of that will require adding CPU-specific
> code to the database.

Actually the state is that with the polynomial used there is currently
close to zero hope of CPUs optimizing for us. By switching the
polynomial we can have hardware acceleration on Intel CPUs, little
hope of others supporting given that AMD hasn't by now and Intel touts
around patents in this area. However the calculation can be made about
factor of 4 faster by restructuring the calculation. This optimization
is plain C and not CPU specific.

The committed checksum is an order of magnitude slower than the
Fletcher one that was performance tested with the patch.

> -The latest idea is using the Fowler–Noll–Vo hash function:
> https://en.wikipedia.org/wiki/Fowler_Noll_Vo_hash  There's 20 years of
> research around when that is good or bad.  The exactly properties depend on
> magic "FNV primes":  http://isthe.com/chongo/tech/comp/fnv/#fnv-prime that
> can vary based on both your target block size and how many bytes you'll
> process at a time.  For PostgreSQL checksums, one of the common
> problems--getting an even distribution of the hashed values--isn't important
> the way it is for other types of hashes.  Ants and Florian have now dug into
> how exactly that and specific CPU optimization concerns impact the best
> approach for 8K database pages.  This is very clearly a 9.4 project that is
> just getting started.

I'm not sure about the 9.4 part: if we ship with the builtin CRC as
committed, there is a 100% chance that we will want to switch out the
algorithm in 9.4, and there will be quite a large subset of users that
will find the performance unusable. If we change it to whatever we
come up with here, there is a small chance that the algorithm will
give worse than expected error detection rate in some circumstances
and we will want offer a better algorithm. More probably it will be
good enough and the low performance hit will allow more users to turn
it on. This is a 16bit checksum that we talking about, not SHA-1, it
is expected to occasionally fail to detect errors. I can provide you
with a patch of the generic version of any of the discussed algorithms
within an hour, leaving plenty of time in beta or in 9.4 to
accommodate the optimized versions. It's literally a dozen self
contained lines of code.

Regards,
Ants Aasma
-- 
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-ha

Re: [HACKERS] Enabling Checksums

2013-04-17 Thread Greg Smith


On 4/17/13 6:32 PM, Tom Lane wrote:

The more I read of this thread, the more unhappy I get.  It appears that
the entire design process is being driven by micro-optimization for CPUs
being built by Intel in 2013.


And that's not going to get anyone past review, since all the tests I've 
been doing the last two weeks are on how fast an AMD Opteron 6234 with 
OS cache >> shared_buffers can run this.  The main thing I'm still 
worried about is what happens when you have a fast machine that can move 
memory around very quickly and an in-memory workload, but it's hamstrung 
by the checksum computation--and it's not a 2013 Intel machine.


The question I started with here was answered to some depth and then 
skipped past.  I'd like to jerk attention back to that, since I thought 
some good answers from Ants went by.  Is there a simple way to optimize 
the committed CRC computation (or a similar one with the same error 
detection properties) based on either:


a) Knowing that the input will be a 8K page, rather than the existing 
use case with an arbitrary sized WAL section.


b) Straightforward code rearrangement or optimization flags.

That was all I thought was still feasible to consider changing for 9.3 a 
few weeks ago.  And the possible scope has only been shrinking since then.



And I reiterate that there is theory out there about the error detection
capabilities of CRCs.  I'm not seeing any theory here, which leaves me
with very little confidence that we know what we're doing.


Let me see if I can summarize where the messages flying by are at since 
you'd like to close this topic for now:


-Original checksum feature used Fletcher checksums.  Its main problems, 
to quote wikipedia, include that it "cannot distinguish between blocks 
of all 0 bits and blocks of all 1 bits".


-Committed checksum feature uses truncated CRC-32.  This has known good 
error detection properties, but is expensive to compute.  There's reason 
to believe that particular computation will become cheaper on future 
platforms though.  But taking full advantage of that will require adding 
CPU-specific code to the database.


-The latest idea is using the Fowler–Noll–Vo hash function: 
https://en.wikipedia.org/wiki/Fowler_Noll_Vo_hash  There's 20 years of 
research around when that is good or bad.  The exactly properties depend 
on magic "FNV primes":  http://isthe.com/chongo/tech/comp/fnv/#fnv-prime 
that can vary based on both your target block size and how many bytes 
you'll process at a time.  For PostgreSQL checksums, one of the common 
problems--getting an even distribution of the hashed values--isn't 
important the way it is for other types of hashes.  Ants and Florian 
have now dug into how exactly that and specific CPU optimization 
concerns impact the best approach for 8K database pages.  This is very 
clearly a 9.4 project that is just getting started.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling Checksums

2013-04-17 Thread Ants Aasma

On Thu, Apr 18, 2013 at 2:25 AM, Florian Pflug  wrote:
> On Apr17, 2013, at 23:44 , Ants Aasma  wrote:
>> Performance results:
>> Mul-add checksums: 12.9 bytes/s
>> FNV-1a checksums: 13.5 bytes/s
>> FNV-1a + srl-1: 7.4 bytes/s
>>
>> Detection rates:
>> False positive rates:
>> Add-mul   FNV-1a FNV-1a + srl-1
>> Single bit flip: 1:inf 1:129590   1:64795
>> Double bit flip: 1:148 1:511  1:53083
>> Triple bit flip: 1:673 1:5060 1:61511
>>  Quad bit flip: 1:18721:193491:68320
>> Write 0x00 byte: 1:774538137   1:118776   1:68952
>> Write 0xFF byte: 1:165399500   1:137489   1:68958
>>  Partial write: 1:59949   1:719391:89923
>>  Write garbage: 1:64866   1:649801:67732
>> Write run of 00: 1:57077   1:611401:59723
>> Write run of FF: 1:63085   1:596091:62977
>>
>> Test descriptions:
>> N bit flip: picks N random non-overlapping bits and flips their value.
>> Write X byte: overwrites a single byte with X.
>> Partial write: picks a random cut point, overwrites everything from
>> there to end with 0x00.
>> Write garbage/run of X: picks two random cut points and fills
>> everything in between with random values/X bytes.
>
> Cool, thanks for testing that! The results for FNV-1a + srl-1 look
> promising, I think. Its failure rate is consistently about 1:2^16,
> which is the value you'd expect. That gives me some confidence that
> the additional shift as working as expected.
>
> BTW, which prime are you using for FNV-1a and FNV-1a+srl1?

The official 32bit FNV one, 16777619.

Offsets were just random numbers. Seems good enough given the
following from the FNV page:

"These non-zero integers are the FNV-0 hashes of the following 32 octets:

chongo  /\../\"

>> The effect on false positive rates
>> for double bit errors is particularly impressive. I'm now running a
>> testrun that shift right by 13 to see how that works out, intuitively
>> it should help dispersing the bits a lot faster.

Empirical results are slightly better with shift of 13:

Single bit flip: 1:61615
Double bit flip: 1:58078
Triple bit flip: 1:66329
  Quad bit flip: 1:62141
Write 0x00 byte: 1:66327
Write 0xFF byte: 1:65274
  Partial write: 1:71939
  Write garbage: 1:65095
 Write run of 0: 1:62845
Write run of FF: 1:64638

> Maybe, but it also makes *only* bits 14 and 15 actually affects bits
> below them, because all other's are shifted out. If you choose the
> right prime it may still work, you'd have to pick one which with
> enough lower bits set so that every bits affects bit 14 or 15 at some
> point…
>
> All in all a small shift seems better to me - if 1 for some reason
> isn't a good choice, I'd expect 3 or so to be a suitable
> replacement, but nothing much larger…

I don't think the big shift is a problem, the other bits were taken
into account by the multiply, and with the larger shift the next
multiplication will disperse the changes once again. Nevertheless, I'm
running the tests with shift of 3 now.

> I should have some time tomorrow to spent on this, and will try
> to validate our FNV-1a modification, and see if I find a way to judge
> whether 1 is a good shift.

Great. I will spend some brain cycles on it too.

Regards,
Ants Aasma
-- 
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling Checksums

2013-04-17 Thread Ants Aasma

On Thu, Apr 18, 2013 at 1:32 AM, Tom Lane  wrote:
> Ants Aasma  writes:
>> I was thinking about something similar too. The big issue here is that
>> the parallel checksums already hide each other latencies effectively
>> executing one each of movdqu/pmullw/paddw each cycle, that's why the
>> N_SUMS adds up to 128 bytes not 16 bytes.
>
> The more I read of this thread, the more unhappy I get.  It appears that
> the entire design process is being driven by micro-optimization for CPUs
> being built by Intel in 2013.  That ought to be, at best, a fifth-order
> consideration, with full recognition that it'll be obsolete in two years,
> and is already irrelevant to anyone not running one of those CPUs.

The large scale structure takes into account the trends in computer
architecture. A lot more so than using anything straight out of the
literature. Specifically, computer architectures have hit a wall in
terms of sequential throughput, so the linear dependency chain in the
checksum algorithm will be the bottleneck soon if it isn't already.
From that it follows that a fast and future proof algorithm should not
calculate the checksum in a single log chain. The proposed algorithms
divide the input into 64x64 and 32x64 chunks. It's easy to show that
both convert the dependency chain from O(n) to O(sqrt(n)). Secondly,
unless we pick something really popular, CPUs are unlikely to provide
specifically for us, so the algorithm should be built from general
purpose computational pieces. Vector integer multiply and xor are
pretty much guaranteed to be there and fast on future CPUs. In my view
it's much more probable to be available and fast on future CPU's than
something like the Intel CRC32 acceleration.

> I would like to ban all discussion of assembly-language optimizations
> until after 9.3 is out, so that we can concentrate on what actually
> matters.  Which IMO is mostly the error detection rate and the probable
> nature of false successes.  I'm glad to see that you're paying at least
> some attention to that, but the priorities in this discussion are
> completely backwards.

I approached it from the angle that what needs to be done to get a
fundamentally fast approach have a good enough error detection rate
and not have a way of generating false positives that will give a
likely error. The algorithms are simple enough and well studied enough
that the rewards from tweaking them are negligible. I think the
resulting performance speaks for itself. Now the question is what is a
good enough algorithm. In my view, the checksum is more like a canary
in the coal mine, not something that can be relied upon, and so
ultimate efficiency is not that important if there are no obvious
horrible cases. I can see that there are other views and so am
exploring different tradeoffs between performance and quality.

> And I reiterate that there is theory out there about the error detection
> capabilities of CRCs.  I'm not seeing any theory here, which leaves me
> with very little confidence that we know what we're doing.

I haven't found much literature that is of use here. There is theory
underlying here coming from basic number theory and distilled into
rules for hash functions. For the FNV hash the prime supposedly is
carefully chosen, although all literature so far is saying "it is a
good choice, but here is not the place to explain why".

Regards,
Ants Aasma
-- 
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling Checksums

2013-04-17 Thread Florian Pflug

On Apr17, 2013, at 23:44 , Ants Aasma  wrote:
> Performance results:
> Mul-add checksums: 12.9 bytes/s
> FNV-1a checksums: 13.5 bytes/s
> FNV-1a + srl-1: 7.4 bytes/s
> 
> Detection rates:
> False positive rates:
> Add-mul   FNV-1a FNV-1a + srl-1
> Single bit flip: 1:inf 1:129590   1:64795
> Double bit flip: 1:148 1:511  1:53083
> Triple bit flip: 1:673 1:5060 1:61511
>  Quad bit flip: 1:18721:193491:68320
> Write 0x00 byte: 1:774538137   1:118776   1:68952
> Write 0xFF byte: 1:165399500   1:137489   1:68958
>  Partial write: 1:59949   1:719391:89923
>  Write garbage: 1:64866   1:649801:67732
> Write run of 00: 1:57077   1:611401:59723
> Write run of FF: 1:63085   1:596091:62977
> 
> Test descriptions:
> N bit flip: picks N random non-overlapping bits and flips their value.
> Write X byte: overwrites a single byte with X.
> Partial write: picks a random cut point, overwrites everything from
> there to end with 0x00.
> Write garbage/run of X: picks two random cut points and fills
> everything in between with random values/X bytes.

Cool, thanks for testing that! The results for FNV-1a + srl-1 look
promising, I think. Its failure rate is consistently about 1:2^16,
which is the value you'd expect. That gives me some confidence that
the additional shift as working as expected.

BTW, which prime are you using for FNV-1a and FNV-1a+srl1?

> So adding in the shifted value nearly cuts the performance in half. I
> think that by playing with the instruction order I might coax the CPU
> scheduler to schedule the instructions better, but even in best case
> it will be somewhat slower. The point to keep in mind that even this
> slower speed is still faster than hardware accelerated CRC32, so all
> in all the hit might not be so bad.

Yeah. ~7 bytes/cycle still translates to over 10GB/s on typical CPU,
so that's still plenty fast I'd say...

> The effect on false positive rates
> for double bit errors is particularly impressive. I'm now running a
> testrun that shift right by 13 to see how that works out, intuitively
> it should help dispersing the bits a lot faster.

Maybe, but it also makes *only* bits 14 and 15 actually affects bits
below them, because all other's are shifted out. If you choose the
right prime it may still work, you'd have to pick one which with
enough lower bits set so that every bits affects bit 14 or 15 at some
point…

All in all a small shift seems better to me - if 1 for some reason
isn't a good choice, I'd expect 3 or so to be a suitable
replacement, but nothing much larger…

I should have some time tomorrow to spent on this, and will try 
to validate our FNV-1a modification, and see if I find a way to judge
whether 1 is a good shift.

>>> I wonder if we use 32bit FNV-1a's (the h = (h^v)*p variant) with
>>> different offset-basis values, would it be enough to just XOR fold the
>>> resulting values together. The algorithm looking like this:
>> 
>> Hm, this will make the algorithm less resilient to some particular
>> input permutations (e.g. those which swap the 64*i-th and the (64+1)-ith
>> words), but those seem very unlikely to occur randomly. But if we're
>> worried about that, we could use your linear combination method for
>> the aggregation phase.
> 
> I don't think it significantly reduces resilience to permutations
> thanks to using different basis offsets and multiply not distributing
> over xor.

Oh, yeah, I though you were still using 0 as base offset. If you don't,
the objection is moot.

best regards,
Florian Pflug




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling Checksums

2013-04-17 Thread Florian Pflug

On Apr18, 2013, at 00:32 , Tom Lane  wrote:
> Ants Aasma  writes:
>> I was thinking about something similar too. The big issue here is that
>> the parallel checksums already hide each other latencies effectively
>> executing one each of movdqu/pmullw/paddw each cycle, that's why the
>> N_SUMS adds up to 128 bytes not 16 bytes.
> 
> The more I read of this thread, the more unhappy I get.  It appears that
> the entire design process is being driven by micro-optimization for CPUs
> being built by Intel in 2013.  That ought to be, at best, a fifth-order
> consideration, with full recognition that it'll be obsolete in two years,
> and is already irrelevant to anyone not running one of those CPUs.

Micro-optimization for particular CPUs yes, but general performance
considerations, no. For example, 2^n is probably one of the worst modulus
you can pick for a hash function - any prime would work much better.
But doing the computations modulo 2^16 or 2^32 carries zero performance
overhead, whereas picking another modulus requires some renormalization
after every operation. That, however, is *not* a given - it stems from
the fact nearly all CPUs in existence operated on binary integers. This
fact must thus enter into the design phase very early, and makes
2^16 or 2^32 a sensible choice for a modulus *despite* it's shortcomings,
simply because it allows for fast implementations.

> I would like to ban all discussion of assembly-language optimizations
> until after 9.3 is out, so that we can concentrate on what actually
> matters.  Which IMO is mostly the error detection rate and the probable
> nature of false successes.  I'm glad to see that you're paying at least
> some attention to that, but the priorities in this discussion are
> completely backwards.

I'd say lots of attention is paid to that, but there's *also* attention
paid to speed. Which I good, because ideally we want to end up with
a checksum with both has good error-detection properties *and* good
performance. If performance is of no concern to us, then there's little
reason not to use CRC…

> And I reiterate that there is theory out there about the error detection
> capabilities of CRCs.  I'm not seeing any theory here, which leaves me
> with very little confidence that we know what we're doing.

If you've got any pointers to literature on error-detection capabilities
of CPU-friendly checksum functions, please share. I am aware of the vast
literature on CRC, and also on some other algebraic approaches, but
none of those even come close to the speed of FNV+shift (unless there's
a special CRC instruction, that is). And there's also a ton of stuff on
cryptographic hashing, but those are optimized for a completely different
use-case...

best regards,
Florian Pflug

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Add \ns command to psql

2013-04-17 Thread Robert Haas

On Tue, Apr 16, 2013 at 5:40 AM, Colin 't Hart  wrote:
> Here's a new version of a small patch to psql I'm using locally.
>
> It adds a command \ns to psql which is a shortcut to set the
> SEARCH_PATH variable.
>
> I've also added tab completion making this command much more useful. I
> don't think tab completition would be possible if this command was
> defined as a variable (which was another suggestion offered at the
> time).

It's possible that the tab completion argument is a sufficient reason
for including this, but I'm kinda skeptical.  The amount of typing
saved is pretty minimal, considering that set sea completes to
set search_path.  Assuming we had proper tab completion for set
search_path = (and off-hand, it doesn't look like that does anything
useful), this would be saving 5 keystrokes every time you want to
change the search path (set sea is eight keystrokes, where
\ns is four... but it also saves you the semicolon at the end).
 I'm sure some people would find that worthwhile, but personally, I
don't.  Short commands are cryptic, and IMHO psql is already an
impenetrable thicket of difficult-to-remember abbreviations.  I've
been using it for more than 10 years now and I still have to to run \?
on a semi-regular basis.  I think that if we start adding things like
this, that help message is going to rapidly fill up with a whole lot
more abbreviations for things that are quite a bit incrementally less
useful than what's there right now.

After all, if we're going to have \ns to set the search path, why not
have something similar for work_mem, or random_page_cost?  I set both
of those variables more often than I set search_path; and there could
easily be someone else out there whose favorite GUC is client_encoding
or whatever.  And, for that matter, why stop with GUCs?  \ct for
CREATE TABLE would save lots of typing, too

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling Checksums

2013-04-17 Thread Tom Lane

Ants Aasma  writes:
> I was thinking about something similar too. The big issue here is that
> the parallel checksums already hide each other latencies effectively
> executing one each of movdqu/pmullw/paddw each cycle, that's why the
> N_SUMS adds up to 128 bytes not 16 bytes.

The more I read of this thread, the more unhappy I get.  It appears that
the entire design process is being driven by micro-optimization for CPUs
being built by Intel in 2013.  That ought to be, at best, a fifth-order
consideration, with full recognition that it'll be obsolete in two years,
and is already irrelevant to anyone not running one of those CPUs.

I would like to ban all discussion of assembly-language optimizations
until after 9.3 is out, so that we can concentrate on what actually
matters.  Which IMO is mostly the error detection rate and the probable
nature of false successes.  I'm glad to see that you're paying at least
some attention to that, but the priorities in this discussion are
completely backwards.

And I reiterate that there is theory out there about the error detection
capabilities of CRCs.  I'm not seeing any theory here, which leaves me
with very little confidence that we know what we're doing.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GENERAL] currval and DISCARD ALL

2013-04-17 Thread Tom Lane

Marko Kreen  writes:
> On Tue, Apr 16, 2013 at 05:09:19PM -0400, Tom Lane wrote:
>> Bruce Momjian  writes:
>>> I think his point is why don't we clear currval() on DISCARD ALL?  I
>>> can't think of a good reason we don't.

>> Because we'd have to invent a new suboperation DISCARD SEQUENCES,
>> for one thing, in order to be consistent.  I'd rather ask why it's
>> important that we should throw away such state.  It doesn't seem to
>> me to be important enough to justify a new subcommand.

> "consistency" is a philosophical thing.

No, it's a critical tool in complexity management.  When you're dealing
with systems as complicated as a database, every little non-orthogonal
detail adds up.  DISCARD ALL has a clear definition in terms of simpler
commands, and it's going to stay that way.  Either this is worth a
subcommand, or it's not worth worrying about at all.

> But currval() is quite noticeable thing that DISCARD ALL should clear.

If it were as obvious and noticeable as all that, somebody would have
noticed before now.  We've had DISCARD ALL with its current meaning
since 8.3, and nobody complained in the five-plus years since that
shipped.

At this point, even if a concrete case were made why DISCARD ALL should
clear currval (and I repeat that no credible case has been made; nobody
has for example pointed to a reasonably-well-designed application that
this breaks), there would be a pretty strong backwards-compatibility
argument not to change it.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling Checksums

2013-04-17 Thread Ants Aasma

On Wed, Apr 17, 2013 at 6:54 PM, Florian Pflug  wrote:
> On Apr17, 2013, at 16:47 , Ants Aasma  wrote:
>> This made me remember, the issue I had was with high order bits, not
>> with low order ones, somehow I got them confused. The exact issue is
>> that the high order bits don't affect any bit lower than them. It's
>> easy to see that if you remember the shift and add nature of multiply.
>> Unfortunately XOR will not fix that. Neither will adding an offset
>> basis. This is the fundamental thing that is behind the not-so-great
>> uncorrelated bit error detection rate.
>
> Right. We could maybe fix that by extending the update step to
>
>   tmp = s[j] ^ d[i,j]
>   s[j] = (t * PRIME) ^ (t >> 1)
>
> or something like that. Shifting t instead of (t * PRIME) should
> help to reduce the performance impact, since a reordering CPU should
> be able to parallelize the multiple and the shift. Note though that
> I haven't really though that through extensively - the general idea
> should be sound, but whether 1 is a good shifting amount I do not
> know.

I was thinking about something similar too. The big issue here is that
the parallel checksums already hide each other latencies effectively
executing one each of movdqu/pmullw/paddw each cycle, that's why the
N_SUMS adds up to 128 bytes not 16 bytes.

I went ahead and coded up both the parallel FNV-1a and parallel FNV-1a
+ srl1-xor variants and ran performance tests and detection rate tests
on both.

Performance results:
Mul-add checksums: 12.9 bytes/s
FNV-1a checksums: 13.5 bytes/s
FNV-1a + srl-1: 7.4 bytes/s

Detection rates:
False positive rates:
 Add-mul   FNV-1a FNV-1a + srl-1
Single bit flip: 1:inf 1:129590   1:64795
Double bit flip: 1:148 1:511  1:53083
Triple bit flip: 1:673 1:5060 1:61511
  Quad bit flip: 1:18721:193491:68320
Write 0x00 byte: 1:774538137   1:118776   1:68952
Write 0xFF byte: 1:165399500   1:137489   1:68958
  Partial write: 1:59949   1:719391:89923
  Write garbage: 1:64866   1:649801:67732
Write run of 00: 1:57077   1:611401:59723
Write run of FF: 1:63085   1:596091:62977

Test descriptions:
N bit flip: picks N random non-overlapping bits and flips their value.
Write X byte: overwrites a single byte with X.
Partial write: picks a random cut point, overwrites everything from
there to end with 0x00.
Write garbage/run of X: picks two random cut points and fills
everything in between with random values/X bytes.

So adding in the shifted value nearly cuts the performance in half. I
think that by playing with the instruction order I might coax the CPU
scheduler to schedule the instructions better, but even in best case
it will be somewhat slower. The point to keep in mind that even this
slower speed is still faster than hardware accelerated CRC32, so all
in all the hit might not be so bad. The effect on false positive rates
for double bit errors is particularly impressive. I'm now running a
testrun that shift right by 13 to see how that works out, intuitively
it should help dispersing the bits a lot faster.

>> I wonder if we use 32bit FNV-1a's (the h = (h^v)*p variant) with
>> different offset-basis values, would it be enough to just XOR fold the
>> resulting values together. The algorithm looking like this:
>
> Hm, this will make the algorithm less resilient to some particular
> input permutations (e.g. those which swap the 64*i-th and the (64+1)-ith
> words), but those seem very unlikely to occur randomly. But if we're
> worried about that, we could use your linear combination method for
> the aggregation phase.

I don't think it significantly reduces resilience to permutations
thanks to using different basis offsets and multiply not distributing
over xor.

>> Speaking against this option is the fact that we will need to do CPU
>> detection at startup to make it fast on the x86 that support SSE4.1,
>> and the fact that AMD CPUs before 2011 will run it an order of
>> magnitude slower (but still faster than the best CRC).
>
> Hm, CPU detection isn't that hard, and given the speed at which Intel
> currently invents new instructions we'll end up going that route sooner
> or later anyway, I think.

Sure it's not that hard but it does have an order of magnitude more
design decisions than #if defined(__x86_64__). Maybe a first stab
could avoid a generic infrastructure and just have the checksum
function as a function pointer, with the default "trampoline"
implementation running a cpuid and overwriting the function pointer
with either the optimized or generic versions and then calling it.

>> Any opinions if it would be a reasonable tradeoff to have a better
>> checksum with great performance on latest x86 CPUs and good
>> performance on other architectures at the expense of having only ok
>> performance on older AMD CPUs?
>
> The loss on AMD is offset by the increased performance on machines
> where we can't vectorize, I'd say.

+1 Old AMD machines won'

Re: [HACKERS] Enabling Checksums

2013-04-17 Thread Bruce Momjian

On Wed, Apr 17, 2013 at 01:59:12PM -0700, Jeff Davis wrote:
> On Wed, 2013-04-17 at 12:42 -0400, Bruce Momjian wrote:
> > > AFAIK, there's currently no per-page checksum flag. Still, being only
> > > able to go from checksummed to not-checksummed probably is for all
> > > practical purposes the same as not being able to pg_upgrade at all.
> > > Otherwise, why would people have enabled checksums in the first place?
> > 
> > Good point, but it is _an_ option, at least.
> > 
> > I would like to know the answer of how an upgrade from checksum to
> > no-checksum would behave so I can modify pg_upgrade to allow it.
> 
> Why? 9.3 pg_upgrade certainly doesn't need it. When we get to 9.4, if
> someone has checksums enabled and wants to disable it, why is pg_upgrade
> the right time to do that? Wouldn't it make more sense to allow them to
> do that at any time?

Well, right now, pg_upgrade is the only way you could potentially turn
off checksums.  You are right that we might eventually want a command,
but my point is that we currently have a limitation in pg_upgrade that
might not be necessary.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GENERAL] currval and DISCARD ALL

2013-04-17 Thread Marko Kreen

On Tue, Apr 16, 2013 at 05:09:19PM -0400, Tom Lane wrote:
> Bruce Momjian  writes:
> > I think his point is why don't we clear currval() on DISCARD ALL?  I
> > can't think of a good reason we don't.
> 
> Because we'd have to invent a new suboperation DISCARD SEQUENCES,
> for one thing, in order to be consistent.  I'd rather ask why it's
> important that we should throw away such state.  It doesn't seem to
> me to be important enough to justify a new subcommand.

"consistency" is a philosophical thing.  Practical reason for
subcommands is possibility to have partial reset for special
situations, pooling or otherwise.  But such usage seems rather
rare in real life.

If the sequences are not worth subcommand, then let's not give them
subcommand and just wait until someone comes with actual reason
to have one.

But currval() is quite noticeable thing that DISCARD ALL should clear.

> Or, if you'd rather a more direct answer: wanting this sounds like
> evidence of bad application design.  Why is your app dependent on
> getting failures from currval, and isn't there a better way to do it?

It just does not sound like, but thats exactly the request - because
DISCARD ALL leaves user-visible state around, it's hard to fix
application that depends on broken assumptions.

In fact, it was surprise to me that currval() works across transactions.
My alternative proposal would be to get rid of such silly behaviour...

-- 
marko

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] event trigger API documentation?

2013-04-17 Thread Peter Eisentraut

On 4/17/13 3:20 PM, Dimitri Fontaine wrote:
>> It would have been good to have at least one untrusted language with
>> > event trigger support, so that you can hook in external auditing or
>> > logging systems.  With the existing PL/pgSQL support, the possible
>> > actions are a bit limited.
> Well, you do realise that the only information you get passed down to
> the event trigger code explicitely are the event name and the command
> tag, and nothing else, right?

Offhand, that seems about enough, but I'm just beginning to explore.

Chances are, event triggers will end up somewhere near the top of the
release announcements, so we should have a consistent message about what
to do with them and how to use them.  If for now, we say, we only
support writing them in PL/pgSQL, and here is how to do that, and here
are some examples, that's fine.  But currently, it's not quite clear.

Surely you had some use cases in mind when you set out to implement
this.  What were they, and where are we now in relation to them?

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling Checksums

2013-04-17 Thread Jeff Davis

On Wed, 2013-04-17 at 16:58 +0100, Greg Stark wrote:
> On Wed, Apr 17, 2013 at 4:28 PM, Florian Pflug  wrote:
> > Is there any way we can change the checksum algorithm in 9.4
> > *without* breaking pg_upgrade?
> 
> Personally I think we're going to need a solution for page format
> changes someday eventually
> 
> What advantages are we postponing now to avoid it?
> 
> * 32-bit checksums?
> * Being able to enable/disable checksums?
> 
> Anything else?

I'm not sure that changing the page format is the most difficult part of
enabling/disabling checksums. It's easy enough to have page header bits
if the current information is not enough (and those bits were there, but
Heikki requested their removal and I couldn't think of a concrete reason
to keep them).

Eventually, it would be nice to be able to break the page format and
have more space for things like checksums (and probably a few other
things, maybe some visibility-related optimizations). But that's a few
years off and we don't have any real plan for that.

What I wanted to accomplish with this patch is the simplest checksum
mechanism that we could get that would be fast enough that many people
would be able to use it. I expect it to be useful until we do decide to
break the page format.

Regards,
Jeff Davis

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling Checksums

2013-04-17 Thread Jeff Davis

On Wed, 2013-04-17 at 12:42 -0400, Bruce Momjian wrote:
> > AFAIK, there's currently no per-page checksum flag. Still, being only
> > able to go from checksummed to not-checksummed probably is for all
> > practical purposes the same as not being able to pg_upgrade at all.
> > Otherwise, why would people have enabled checksums in the first place?
> 
> Good point, but it is _an_ option, at least.
> 
> I would like to know the answer of how an upgrade from checksum to
> no-checksum would behave so I can modify pg_upgrade to allow it.

Why? 9.3 pg_upgrade certainly doesn't need it. When we get to 9.4, if
someone has checksums enabled and wants to disable it, why is pg_upgrade
the right time to do that? Wouldn't it make more sense to allow them to
do that at any time?

Regards,
Jeff Davis




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Fix typo in contrib/hstore/crc32.c comment

2013-04-17 Thread Fabrízio de Royes Mello

Hi all,

The attached patch fix a little typo on contrib/hstore/crc32.c comment.

Regards,

-- 
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Blog sobre TI: http://fabriziomello.blogspot.com
>> Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello


fix_typo_hstore_crc32_comment.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] event trigger API documentation?

2013-04-17 Thread Dimitri Fontaine

Peter Eisentraut  writes:
> Well, if documentation had been available well before beta, other
> procedural languages might have gained support for event triggers.  If
> it's not being documented, it might not happen very soon.

It's been a moving target for the last two years, and until very
recently what to document was not clear enough to spend any time on
actually writing the docs.

Please also note that the first series of patches did include the
support code for all the core PL, but Robert didn't feel like commiting
that and no other commiter did step up.

I'm struggling to understand how to properly solve the problem here from
an organisation perspective. Before beta was not the good time for the
people involved, and was not the good time for other people to get
involved. Beta is not the good time to fix what couldn't be done before.

When are we supposed to work on the rough edges left when a patch went
through 8 commit fests and so many discussions that it's quite hard
indeed to step back and understand what's in and what's missing to make
it sensible for the release?

Maybe the right answer is to remove the documentation about event
triggers completely for 9.3 and tell the users about them later when we
have something else than just internal infrastructure.

Now, if it's ok to add support to others PL, I can cook a patch from the
bits I had done last year, the only work should be removing variables.

> It would have been good to have at least one untrusted language with
> event trigger support, so that you can hook in external auditing or
> logging systems.  With the existing PL/pgSQL support, the possible
> actions are a bit limited.

Well, you do realise that the only information you get passed down to
the event trigger code explicitely are the event name and the command
tag, and nothing else, right?

If you have a use case that requires any other information, then
documenting the event triggers will do nothing to help you implement it,
you will need to code in C and go look at the backend sources.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling Checksums

2013-04-17 Thread Tom Lane

Bruce Momjian  writes:
> On Wed, Apr 17, 2013 at 01:29:18PM -0400, Tom Lane wrote:
>> But having said that, I'm not sure why this would be pg_upgrade's
>> problem.  By definition, we do not want pg_upgrade running around
>> looking at individual data pages.  Therefore, whatever we might do
>> about checksum algorithm changes would have to be something that can be
>> managed on-the-fly by the newer server.

> Well, my idea was that pg_upgrade would allow upgrades from old clusters
> with the same checksum algorithm version, but not non-matching ones. 
> This would allow the checksum algorithm to be changed and force
> pg_upgrade to fail.

It's rather premature to be defining pg_upgrade's behavior for a
situation that doesn't exist yet, and may very well never exist
in that form.  It seems more likely to me that we'd want to allow
incremental algorithm changes, in which case pg_upgrade ought not do
anything about this case anyway.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling Checksums

2013-04-17 Thread Bruce Momjian

On Wed, Apr 17, 2013 at 01:29:18PM -0400, Tom Lane wrote:
> Bruce Momjian  writes:
> > Uh, not sure how pg_upgrade would detect that as the version number is
> > not stored in pg_controldata, e.g.:
> 
> > Data page checksums:  enabled/disabled
> 
> That seems pretty shortsighted.  The field probably ought to be defined
> as containing a checksum algorithm ID number, not a boolean.
> 
> But having said that, I'm not sure why this would be pg_upgrade's
> problem.  By definition, we do not want pg_upgrade running around
> looking at individual data pages.  Therefore, whatever we might do
> about checksum algorithm changes would have to be something that can be
> managed on-the-fly by the newer server.

Well, my idea was that pg_upgrade would allow upgrades from old clusters
with the same checksum algorithm version, but not non-matching ones. 
This would allow the checksum algorithm to be changed and force
pg_upgrade to fail.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] erroneous restore into pg_catalog schema

2013-04-17 Thread Tom Lane

Robert Haas  writes:
> On Tue, Jan 29, 2013 at 6:00 PM, Tom Lane  wrote:
>> I think we need to do *something* (and accordingly have added this to
>> the 9.3 open items page so we don't forget about it).  Whether Robert's
>> idea is the best one probably depends in part on how clean the patch
>> turns out to be.

> The attached patch attempts to implement this.  I discovered that, in
> fact, we have a number of places in our initdb-time scripts that rely
> on the current behavior, but they weren't hard to fix; and in fact I
> think the extra verbosity is probably not a bad thing here.

> See what you think.

I think this breaks contrib/adminpack, and perhaps other extensions.
They'd not be hard to fix with script changes, but they'd be broken.

In general, we would now have a situation where relocatable extensions
could never be installed into pg_catalog.  That might be OK, but at
least it would need to be documented.

Also, I think we'd be pretty much hard-wiring the decision that pg_dump
will never dump objects in pg_catalog, because its method for selecting
the creation schema won't work in that case.  That probably is all right
too, but we need to realize it's a consequence of this.

As far as the code goes, OK except I strongly disapprove of removing
the comment about temp_missing at line 3512.  The coding is not any less
a hack in that respect for having been pushed into a subroutine.  If
you want to rewrite the comment, fine, but failing to point out that
something funny is going on is not a service to readers.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling Checksums

2013-04-17 Thread Tom Lane

Bruce Momjian  writes:
> Uh, not sure how pg_upgrade would detect that as the version number is
> not stored in pg_controldata, e.g.:

>   Data page checksums:  enabled/disabled

That seems pretty shortsighted.  The field probably ought to be defined
as containing a checksum algorithm ID number, not a boolean.

But having said that, I'm not sure why this would be pg_upgrade's
problem.  By definition, we do not want pg_upgrade running around
looking at individual data pages.  Therefore, whatever we might do
about checksum algorithm changes would have to be something that can be
managed on-the-fly by the newer server.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling Checksums

2013-04-17 Thread Bruce Momjian

On Wed, Apr 17, 2013 at 01:22:01PM -0400, Tom Lane wrote:
> Greg Stark  writes:
> > On Wed, Apr 17, 2013 at 4:28 PM, Florian Pflug  wrote:
> >> Is there any way we can change the checksum algorithm in 9.4
> >> *without* breaking pg_upgrade?
> 
> > Personally I think we're going to need a solution for page format
> > changes someday eventually
> 
> > What advantages are we postponing now to avoid it?
> 
> Um, other than the ability to make a release?
> 
> We aren't going to hold up 9.3 until that particular bit of pie in the
> sky lands.  Indeed I don't expect to see it available in the next couple
> years either.  When we were looking at that seriously, two or three
> years ago, arbitrary page format changes looked *hard*.
> 
> The idea of bumping the page format version number to signal a checksum
> algorithm change might work though.

Uh, not sure how pg_upgrade would detect that as the version number is
not stored in pg_controldata, e.g.:

Data page checksums:  enabled/disabled

Do we need to address this for 9.3?  (Yuck)

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling Checksums

2013-04-17 Thread Tom Lane

Greg Stark  writes:
> On Wed, Apr 17, 2013 at 4:28 PM, Florian Pflug  wrote:
>> Is there any way we can change the checksum algorithm in 9.4
>> *without* breaking pg_upgrade?

> Personally I think we're going to need a solution for page format
> changes someday eventually

> What advantages are we postponing now to avoid it?

Um, other than the ability to make a release?

We aren't going to hold up 9.3 until that particular bit of pie in the
sky lands.  Indeed I don't expect to see it available in the next couple
years either.  When we were looking at that seriously, two or three
years ago, arbitrary page format changes looked *hard*.

The idea of bumping the page format version number to signal a checksum
algorithm change might work though.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling Checksums

2013-04-17 Thread Bruce Momjian

On Wed, Apr 17, 2013 at 06:33:58PM +0200, Florian Pflug wrote:
> > I was going to ask about the flexibility of pg_upgrade and checksums. 
> > Right now you have to match the old and new cluster checksum modes, but
> > it seems it would be possible to allow pg_upgrade use from checksum to
> > no-checksum servers.  Does the backend look at the pg_controldata setting,
> > or at the page checksum flag?  If the former, it seems pg_upgrade could
> > run a a no-checksum server just fine that had checksum information on
> > its pages.  This might give us more flexibility in changing the checksum
> > algorithm in the future, i.e. you only lose checksum ability.
> 
> AFAIK, there's currently no per-page checksum flag. Still, being only
> able to go from checksummed to not-checksummed probably is for all
> practical purposes the same as not being able to pg_upgrade at all.
> Otherwise, why would people have enabled checksums in the first place?

Good point, but it is _an_ option, at least.

I would like to know the answer of how an upgrade from checksum to
no-checksum would behave so I can modify pg_upgrade to allow it.


-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling Checksums

2013-04-17 Thread Florian Pflug

On Apr17, 2013, at 18:15 , Bruce Momjian  wrote:
> On Wed, Apr 17, 2013 at 05:28:06PM +0200, Florian Pflug wrote:
>> However, you're right that time's running out. It'd be a shame though
>> if we'd lock ourselves into CRC as the only available algorithm essentially
>> forever.  Is there any way we can change the checksum algorithm in 9.4
>> *without* breaking pg_upgrade? Maybe pd_pagesize_version could be used
>> for that - we could make version 5 mean "just like version 4, but with
>> a different checksum algorithm". Since the layout wouldn't actually
>> chance, that'd be far easier to pull off than actually supporting multiple
>> page layouts. If that works, then shipping 9.3 with CRC is probably
>> the best solution. If not, we should see to it that something like Ants
>> parallel version of FNV or a smallget into 9.3 if at all possible,
>> IMHO.
> 
> I was going to ask about the flexibility of pg_upgrade and checksums. 
> Right now you have to match the old and new cluster checksum modes, but
> it seems it would be possible to allow pg_upgrade use from checksum to
> no-checksum servers.  Does the backend look at the pg_controldata setting,
> or at the page checksum flag?  If the former, it seems pg_upgrade could
> run a a no-checksum server just fine that had checksum information on
> its pages.  This might give us more flexibility in changing the checksum
> algorithm in the future, i.e. you only lose checksum ability.

AFAIK, there's currently no per-page checksum flag. Still, being only
able to go from checksummed to not-checksummed probably is for all
practical purposes the same as not being able to pg_upgrade at all.
Otherwise, why would people have enabled checksums in the first place?

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling Checksums

2013-04-17 Thread Bruce Momjian

On Wed, Apr 17, 2013 at 05:28:06PM +0200, Florian Pflug wrote:
> However, you're right that time's running out. It'd be a shame though
> if we'd lock ourselves into CRC as the only available algorithm essentially
> forever.  Is there any way we can change the checksum algorithm in 9.4
> *without* breaking pg_upgrade? Maybe pd_pagesize_version could be used
> for that - we could make version 5 mean "just like version 4, but with
> a different checksum algorithm". Since the layout wouldn't actually
> chance, that'd be far easier to pull off than actually supporting multiple
> page layouts. If that works, then shipping 9.3 with CRC is probably
> the best solution. If not, we should see to it that something like Ants
> parallel version of FNV or a smallget into 9.3 if at all possible,
> IMHO.

I was going to ask about the flexibility of pg_upgrade and checksums. 
Right now you have to match the old and new cluster checksum modes, but
it seems it would be possible to allow pg_upgrade use from checksum to
no-checksum servers.  Does the backend look at the pg_controldata setting,
or at the page checksum flag?  If the former, it seems pg_upgrade could
run a a no-checksum server just fine that had checksum information on
its pages.  This might give us more flexibility in changing the checksum
algorithm in the future, i.e. you only lose checksum ability.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Changing schema on the fly

2013-04-17 Thread Daniele Varrazzo

Hello dear -hackers,

I'm maintaining pg_reorg/pg_repack, which you may know effectively
allow online VACUUM FULL or CLUSTER. It works by installing logging
triggers to keep data up-to-date during the migration, creating a copy
of the table, and eventually swapping the tables relfilenodes.

The new table is forced to keep exactly the same physical structure,
e.g. restoring dropped columns too. Failing to do so was apparently a
big mistake, looking at this commit [1]. My knowledge of Postgres at
that level is limited: what I imagine is that cached plans keep the
offset of the field in the row, so data ends up read/written in the
wrong position if such offset changes. The commit message mentions
views and stored procedures being affected.

Is there a way to force invalidation of all the cache that may hold a
reference to the columns offset? Or is the problem an entirely
different one and the above cache invalidation wouldn't be enough?

If we managed to allow schema change in pg_repack we could allow many
more online manipulations features: changing data types, reordering
columns, really dropping columns freeing up space etc.

Thank you very much,

-- Daniele

[1] 
https://github.com/reorg/pg_repack/commit/960930b645df8eeeda15f176c95d3e450786f78a


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling Checksums

2013-04-17 Thread Greg Stark

On Wed, Apr 17, 2013 at 4:28 PM, Florian Pflug  wrote:
> Is there any way we can change the checksum algorithm in 9.4
> *without* breaking pg_upgrade?

Personally I think we're going to need a solution for page format
changes someday eventually

What advantages are we postponing now to avoid it?

* 32-bit checksums?
* Being able to enable/disable checksums?

Anything else?


-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling Checksums

2013-04-17 Thread Florian Pflug

On Apr17, 2013, at 16:47 , Ants Aasma  wrote:
> This made me remember, the issue I had was with high order bits, not
> with low order ones, somehow I got them confused. The exact issue is
> that the high order bits don't affect any bit lower than them. It's
> easy to see that if you remember the shift and add nature of multiply.
> Unfortunately XOR will not fix that. Neither will adding an offset
> basis. This is the fundamental thing that is behind the not-so-great
> uncorrelated bit error detection rate.

Right. We could maybe fix that by extending the update step to

  tmp = s[j] ^ d[i,j]
  s[j] = (t * PRIME) ^ (t >> 1)

or something like that. Shifting t instead of (t * PRIME) should
help to reduce the performance impact, since a reordering CPU should
be able to parallelize the multiple and the shift. Note though that
I haven't really though that through extensively - the general idea
should be sound, but whether 1 is a good shifting amount I do not
know.

> While I understand that linearity is not a desirable property, I
> couldn't think of a realistic case where it would hurt. I can see how
> it can hurt checksums of variable length values, but for our fixed
> buffer case it's definitely not so clear cut. On the pro side the
> distributive property that is behind linearity allowed me to do final
> aggregation in a tree form, performing the multiplies in parallel
> instead of linearly. This adds up to the difference between 250 cycles
> (64*(3 cycle IMUL + 1 cycle XOR)) and 25 cycles (4*5 cycle pmullw + 5
> cycle addw). Given that the main loop is about 576 cycles, this is a
> significant difference.

> I wonder if we use 32bit FNV-1a's (the h = (h^v)*p variant) with
> different offset-basis values, would it be enough to just XOR fold the
> resulting values together. The algorithm looking like this:

Hm, this will make the algorithm less resilient to some particular
input permutations (e.g. those which swap the 64*i-th and the (64+1)-ith
words), but those seem very unlikely to occur randomly. But if we're
worried about that, we could use your linear combination method for
the aggregation phase.

> Speaking against this option is the fact that we will need to do CPU
> detection at startup to make it fast on the x86 that support SSE4.1,
> and the fact that AMD CPUs before 2011 will run it an order of
> magnitude slower (but still faster than the best CRC).

Hm, CPU detection isn't that hard, and given the speed at which Intel
currently invents new instructions we'll end up going that route sooner
or later anyway, I think. 

> Any opinions if it would be a reasonable tradeoff to have a better
> checksum with great performance on latest x86 CPUs and good
> performance on other architectures at the expense of having only ok
> performance on older AMD CPUs?

The loss on AMD is offset by the increased performance on machines
where we can't vectorize, I'd say.

best regards,
Florian Pflug

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] event trigger API documentation?

2013-04-17 Thread Peter Eisentraut

On 4/17/13 5:41 AM, Dimitri Fontaine wrote:
> I'm not sure about ripping it out, it does not sound like a good idea to
> me. It needs some addition and C level examples yes. The plan was to
> build a contrib module as an example, that would cancel any (supported)
> command you try to run by means of ereport(ERROR, …);. Then add that in
> pieces in the docs with details about what's going on.
> 
> While the commit fest was still running didn't look like the right time
> to work on that. Beta looks like when to be working on that.

Well, if documentation had been available well before beta, other
procedural languages might have gained support for event triggers.  If
it's not being documented, it might not happen very soon.

It would have been good to have at least one untrusted language with
event trigger support, so that you can hook in external auditing or
logging systems.  With the existing PL/pgSQL support, the possible
actions are a bit limited.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling Checksums

2013-04-17 Thread Florian Pflug

On Apr17, 2013, at 17:09 , Bruce Momjian  wrote:
> As much as I love the idea of improving the algorithm, it is disturbing
> we are discussing this so close to beta, with an algorithm that is under
> analysis, with no (runtime) CPU detection, and in something that is
> going to be embedded into our data page format.  I can't even think of
> another case where we do run-time CPU detection.

We could still ship the new checksum algorithm with 9.3, but omit the
SSE-optimized version, i.e. include only the plain C implementation.
I think Ants mentioned somehwere that gcc does a pretty good job of
vectorizing that, so people who really care (and who use GCC)
could compile with -msse41 --unrool-loops --tree-vectorize, and get
performance close to that of a hand-coded SSE version.

The important decision we're facing is which algorithm to use. I personally
believe Ants is on the right track there - FNV or a variant thereof
looks like a good choice to me, but the details have yet to be nailed
I think.

However, you're right that time's running out. It'd be a shame though
if we'd lock ourselves into CRC as the only available algorithm essentially
forever.  Is there any way we can change the checksum algorithm in 9.4
*without* breaking pg_upgrade? Maybe pd_pagesize_version could be used
for that - we could make version 5 mean "just like version 4, but with
a different checksum algorithm". Since the layout wouldn't actually
chance, that'd be far easier to pull off than actually supporting multiple
page layouts. If that works, then shipping 9.3 with CRC is probably
the best solution. If not, we should see to it that something like Ants
parallel version of FNV or a smallget into 9.3 if at all possible,
IMHO.

best regards,
Florian Pflug

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling Checksums

2013-04-17 Thread Bruce Momjian

On Wed, Apr 17, 2013 at 05:47:55PM +0300, Ants Aasma wrote:
> The SSE4.1 implementation of this would be as fast as the last pat,
> generic version will be faster and we avoid the linearity issue. By
> using different offsets for each of the partial hashes we don't
> directly suffer from commutativity of the final xor folding. By using
> the xor-then-multiply variant the last values hashed have their bits
> mixed before folding together.
> 
> Speaking against this option is the fact that we will need to do CPU
> detection at startup to make it fast on the x86 that support SSE4.1,
> and the fact that AMD CPUs before 2011 will run it an order of
> magnitude slower (but still faster than the best CRC).
> 
> Any opinions if it would be a reasonable tradeoff to have a better
> checksum with great performance on latest x86 CPUs and good
> performance on other architectures at the expense of having only ok
> performance on older AMD CPUs?
> 
> Also, any good suggestions where should we do CPU detection when we go
> this route?

As much as I love the idea of improving the algorithm, it is disturbing
we are discussing this so close to beta, with an algorithm that is under
analysis, with no (runtime) CPU detection, and in something that is
going to be embedded into our data page format.  I can't even think of
another case where we do run-time CPU detection.

I am wondering if we need to tell users that pg_upgrade will not be
possible if you enable page-level checksums, so we are not trapped with
something we want to improve in 9.4.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling Checksums

2013-04-17 Thread Ants Aasma

On Wed, Apr 17, 2013 at 2:26 AM, Florian Pflug  wrote:
>>> This raises two question. First, why are there two primes? You could
>>> just as well using a single prime q and set p=q^64 mod 2^16. You then
>>> get
>>>  S = sum V[i,j] * q^(64*(64-i) + (64-j)
>>>= sum V[i,j] * q^(4096 - 64*(i-1) - j)
>>> You get higher prime powers that way, but you can easily choose a prime
>>> that yields distinct values mod 2^16 for exponents up to 16383. Your
>>> PRIME2, for example, does. (It wraps around for 16384, i.e.
>>> PRIME2^16384 = 1 mod 2^16, but that's true for every possible prime since
>>> 16384 is the Carmichael function's value at 2^16)
>>
>> The experimental detection rate is about the same if we use a single
>> prime. But I think you have the analytical form wrong here. It should
>> be given q = p:
>>
>>S = sum V[i,j] * p^(64-i) * p^(64-j)
>>  = sum V[i,j] * p^(64 - i + 64 - j)
>>  = sum V[i,j] * p^(128 - i -j)
>
> Yeah, if you set q = p that's true. My suggestion was p=q^64 though...

So it was, I guess it was too late here and I missed it... All thing
considered that is a good suggestion, if for nothing else, the generic
implementation can be smaller this way.

>>> Second, why does it use addition instead of XOR? It seems that FNV
>>> usually XORs the terms together instead of adding them?
>>
>> Testing showed slightly better detection rate for adds. Intuitively I
>> think it's because the carry introduces some additional mixing.
>
> Hm, but OTOH it makes S linear in V, i.e. if you have two inputs
> V1,V2 and V = V1 + V2, then S = S1 + S2. Also, if V' = V*m, then
> S' = S*m. The second property is quite undesirable, I think. Assume
> all the V[i,j] are divisible by 2^k, i.e. have zeros at all bit
> positions 0..(k-1). Then, due to linearity, S is also divisible by
> 2^k, i.e. also has no ones before the k-th bit. This means, for example
> that if you hash values values which all have their lowest bit cleared,
> you get only 2^15 distinct hash values. If they all have the two
> lowest bits cleared, you get only 2^14 distinct values, and so on…
>
> Generally, linearity doesn't seem to be a property that one wants
> in a hash I think, so my suggestion is to stick to XOR.

This made me remember, the issue I had was with high order bits, not
with low order ones, somehow I got them confused. The exact issue is
that the high order bits don't affect any bit lower than them. It's
easy to see that if you remember the shift and add nature of multiply.
Unfortunately XOR will not fix that. Neither will adding an offset
basis. This is the fundamental thing that is behind the not-so-great
uncorrelated bit error detection rate.

While I understand that linearity is not a desirable property, I
couldn't think of a realistic case where it would hurt. I can see how
it can hurt checksums of variable length values, but for our fixed
buffer case it's definitely not so clear cut. On the pro side the
distributive property that is behind linearity allowed me to do final
aggregation in a tree form, performing the multiplies in parallel
instead of linearly. This adds up to the difference between 250 cycles
(64*(3 cycle IMUL + 1 cycle XOR)) and 25 cycles (4*5 cycle pmullw + 5
cycle addw). Given that the main loop is about 576 cycles, this is a
significant difference.

>>> Here, btw, is a page on FNV hashing. It mentions a few rules for
>>> picking suitable primes
>>>
>>> http://www.isthe.com/chongo/tech/comp/fnv
>>
>> Unfortunately the rules don't apply here because of the hash size.
>
> Yeah :-(.
>
> I noticed that their 32-bit prime only has a single one outside
> the first 16 bits. Maybe we can take advantage of that and use a
> 32-bit state while still providing decent performance on machines
> without a 32-bit x 32-bit -> 32-bit multiply instruction?

Looking at the Power instruction set, a 32bit mul by the FNV prime
would look like this:

vmulouh tmp1, hash, prime
vmladduhm tmp1, hash, prime<<16
vslw tmp2, hash, 24
vadduwm hash, tmp1, tmp2

That is 4 instructions to multiply 4 values. Depending on the specific
execution ports on the processor it might faster or slower than the
scalar version but not by a whole lot. Main benefit would be that the
intermediate state could be held in registers.

> If we lived in an Intel-only world, I'd suggest going with a
> 32-bit state, since SSE4.1 support is *very* wide-spread already -
> the last CPUs without it came out over 5 years ago, I think.
> (Core2 and later support SSE4.1, and some later Core1 do too)
>
> But unfortunately things look bleak even for other x86
> implementations - AMD support SSE4.1 only starting with
> Bulldozer, which came out 2011 or so I believe. Leaving the x86
> realm, it seems that only ARM's NEON provides the instructions
> we'd need - AltiVec seems to be support only 16-bit multiplies,
> and from what some quick googling brought up, MIPS and SPARC
> SIMD instructions look no better..
>
> OTOH, chances are that nobody will ever d

Re: [HACKERS] erroneous restore into pg_catalog schema

2013-04-17 Thread Robert Haas

On Tue, Jan 29, 2013 at 6:00 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Tue, Jan 29, 2013 at 2:30 PM, Alvaro Herrera
>>> Robert, are you working on this?
>
>> I wasn't, but I can, if we agree on it.
>
> I think we need to do *something* (and accordingly have added this to
> the 9.3 open items page so we don't forget about it).  Whether Robert's
> idea is the best one probably depends in part on how clean the patch
> turns out to be.

The attached patch attempts to implement this.  I discovered that, in
fact, we have a number of places in our initdb-time scripts that rely
on the current behavior, but they weren't hard to fix; and in fact I
think the extra verbosity is probably not a bad thing here.

See what you think.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


explicit-pg-catalog.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] event trigger API documentation?

2013-04-17 Thread Tom Lane

Dimitri Fontaine  writes:
> I'm not sure about ripping it out, it does not sound like a good idea to
> me. It needs some addition and C level examples yes. The plan was to
> build a contrib module as an example, that would cancel any (supported)
> command you try to run by means of ereport(ERROR, â¦);. Then add that in
> pieces in the docs with details about what's going on.

> While the commit fest was still running didn't look like the right time
> to work on that. Beta looks like when to be working on that.

> What do you think about the proposal here?

We're not adding new contrib modules during beta.  Expanding the
documentation seems like a fine beta-period activity, though.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [sepgsql 2/3] Add db_schema:search permission checks

2013-04-17 Thread Robert Haas

On Fri, Apr 12, 2013 at 2:44 PM, Kohei KaiGai  wrote:
> Yes, of course. The attached one replaces the getObjectDescription in
> sepgsql/proc.c, and relative changes in regression test.

Thanks.  Committed.  I also committed the first two hunks of your
cleanup patch but omitted the third one, which is not well-worded and
seems like overkill anyway.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] TODO links broken?

2013-04-17 Thread Robert Haas

On Wed, Apr 17, 2013 at 9:23 AM, Magnus Hagander  wrote:
>> Hmm.  Sorry for the lack of detail.  I assumed the problem was obvious
>> and widespread because I clicked on the first link I saw in the Todo
>> and it didn't work.  But after clicking a bunch more links from the
>> Todo, I only found three that fail.
>>
>> http://archives.postgresql.org/pgsql-hackers/2008-12/msg01340.php
>> http://archives.postgresql.org/pgsql-hackers/2011-03/msg01831.php
>
> Works now, so that seems to have been fixed by the reverting of the
> patch. It might be a while before they all recover due to caching
> issues, but both of these work now for me, which seems to indcate the
> fix is the right one.
>
>> http://www.postgresql.org/message-id/4B577E9F.8000505%40dunslane.net/
>
> It works with %40 for me now, so it might have been related - can you
> check if it is still an issue for you? It might be different in
> different browsers.

Yeah, it seems OK now.  Thanks for the quick response.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] TODO links broken?

2013-04-17 Thread Magnus Hagander

On Wed, Apr 17, 2013 at 3:14 PM, Robert Haas  wrote:
> On Wed, Apr 17, 2013 at 8:48 AM, Magnus Hagander  wrote:
>> Yes. We can infer that. It makes it a whole lot easier to fix
>> something with better bug repors than that, of course, as I'm sure you
>> (Robert in this case, not Stephen) are generally aware of.
>>
>> I've reverted a patch that was applied a few days ago that dealt with
>> how URLs are parsed, and I think that's the one that's responsible.
>> But it would be good to have an actual example of what didn't work,
>> because the links i tried all worked...
>
> Hmm.  Sorry for the lack of detail.  I assumed the problem was obvious
> and widespread because I clicked on the first link I saw in the Todo
> and it didn't work.  But after clicking a bunch more links from the
> Todo, I only found three that fail.
>
> http://archives.postgresql.org/pgsql-hackers/2008-12/msg01340.php
> http://archives.postgresql.org/pgsql-hackers/2011-03/msg01831.php


Works now, so that seems to have been fixed by the reverting of the
patch. It might be a while before they all recover due to caching
issues, but both of these work now for me, which seems to indcate the
fix is the right one.


> http://www.postgresql.org/message-id/4B577E9F.8000505%40dunslane.net/

It works with %40 for me now, so it might have been related - can you
check if it is still an issue for you? It might be different in
different browsers.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] TODO links broken?

2013-04-17 Thread Robert Haas

On Wed, Apr 17, 2013 at 8:48 AM, Magnus Hagander  wrote:
> Yes. We can infer that. It makes it a whole lot easier to fix
> something with better bug repors than that, of course, as I'm sure you
> (Robert in this case, not Stephen) are generally aware of.
>
> I've reverted a patch that was applied a few days ago that dealt with
> how URLs are parsed, and I think that's the one that's responsible.
> But it would be good to have an actual example of what didn't work,
> because the links i tried all worked...

Hmm.  Sorry for the lack of detail.  I assumed the problem was obvious
and widespread because I clicked on the first link I saw in the Todo
and it didn't work.  But after clicking a bunch more links from the
Todo, I only found three that fail.

http://archives.postgresql.org/pgsql-hackers/2008-12/msg01340.php
http://archives.postgresql.org/pgsql-hackers/2011-03/msg01831.php
http://www.postgresql.org/message-id/4B577E9F.8000505%40dunslane.net/

That last one works if I change %40 to @, so that one might be a wiki
problem rather than an archives problem.  In fact, for all I know the
other two might have been broken all along too; I'm just assuming they
used to work.

Sorry for going overboard,

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Inconsistent DB data in Streaming Replication

2013-04-17 Thread Amit Kapila

On Wednesday, April 17, 2013 4:19 PM Florian Pflug wrote:
> On Apr17, 2013, at 12:22 , Amit Kapila  wrote:
> > Do you mean to say that as an error has occurred, so it would not be
> able to
> > flush received WAL, which could result in loss of WAL?
> > I think even if error occurs, it will call flush in WalRcvDie(),
> before
> > terminating WALReceiver.
> 
> Hm, true, but for that to prevent the problem the inner processing
> loop needs to always read up to EOF before it exits and we attempt
> to send a reply. Which I don't think it necessarily does. Assume,
> that the master sends a chunk of data, waits a bit, and finally
> sends the shutdown record and exits. The slave might then receive
> the first chunk, and it might trigger sending a reply. At the time
> the reply is sent, the master has already sent the shutdown record
> and closed the connection, and we'll thus fail to reply and abort.
> Since the shutdown record has never been read from the socket,
> XLogWalRcvFlush won't flush it, and the slave ends up behind the
> master.
> 
> Also, since XLogWalRcvProcessMsg responds to keep-alives messages,
> we might also error out of the inner processing loop if the server
> closes the socket after sending a keepalive but before we attempt
> to respond.
> 
> Fixing this on the receive side alone seems quite messy and fragile.
> So instead, I think we should let the master send a shutdown message
> after it has sent everything it wants to send, and wait for the client
> to acknowledge it before shutting down the socket.
> 
> If the client fails to respond, we could log a fat WARNING.

Your explanation seems to be okay, but I think before discussing the exact
solution, 
If the actual problem can be reproduced, then it might be better to discuss
this solution.

With Regards,
Amit Kapila.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] TODO links broken?

2013-04-17 Thread Magnus Hagander

On Wed, Apr 17, 2013 at 2:13 PM, Robert Haas  wrote:
> On Wed, Apr 17, 2013 at 12:21 AM, Stephen Scheck
>  wrote:
>> Many of the links in the TODO wiki page result in a "page not found" error.
>> Is this page up-to-date?
>> Can anything be inferred about the status of these items from the broken
>> link?
>
> I think what we can infer is that the new archives code is broken.  I
> hope someone is planning to fix that.  If there's been some decision

Yes. We can infer that. It makes it a whole lot easier to fix
something with better bug repors than that, of course, as I'm sure you
(Robert in this case, not Stephen) are generally aware of.

I've reverted a patch that was applied a few days ago that dealt with
how URLs are parsed, and I think that's the one that's responsible.
But it would be good to have an actual example of what didn't work,
because the links i tried all worked...

> made that we don't have to support the historical URLs for our
> archives pages, I think that's a really bad plan; those links are in a
> lot more places than just the Todo.

No, the plan has always been to support those. There are no plans to
remove that.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] TODO links broken?

2013-04-17 Thread Robert Haas

On Wed, Apr 17, 2013 at 12:21 AM, Stephen Scheck
 wrote:
> Many of the links in the TODO wiki page result in a "page not found" error.
> Is this page up-to-date?
> Can anything be inferred about the status of these items from the broken
> link?

I think what we can infer is that the new archives code is broken.  I
hope someone is planning to fix that.  If there's been some decision
made that we don't have to support the historical URLs for our
archives pages, I think that's a really bad plan; those links are in a
lot more places than just the Todo.

As for your actual question, the TODO list is an accumulation of items
that someone, sometime over the last ten years, thought would be
valuable to work on, and nobody objected too strenuously to the idea.
The fact that it's in the TODO list doesn't mean that anyone is
working on it now, that anyone will ever work on it, or that people
would still think it was a good idea if it were re-proposed today.
It's just kind of a list to jump start people's thinking, and
shouldn't be taken as particularly official.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Inconsistent DB data in Streaming Replication

2013-04-17 Thread Florian Pflug

On Apr17, 2013, at 12:22 , Amit Kapila  wrote:
> Do you mean to say that as an error has occurred, so it would not be able to
> flush received WAL, which could result in loss of WAL?
> I think even if error occurs, it will call flush in WalRcvDie(), before
> terminating WALReceiver.

Hm, true, but for that to prevent the problem the inner processing
loop needs to always read up to EOF before it exits and we attempt
to send a reply. Which I don't think it necessarily does. Assume,
that the master sends a chunk of data, waits a bit, and finally
sends the shutdown record and exits. The slave might then receive
the first chunk, and it might trigger sending a reply. At the time
the reply is sent, the master has already sent the shutdown record
and closed the connection, and we'll thus fail to reply and abort.
Since the shutdown record has never been read from the socket,
XLogWalRcvFlush won't flush it, and the slave ends up behind the
master.

Also, since XLogWalRcvProcessMsg responds to keep-alives messages,
we might also error out of the inner processing loop if the server
closes the socket after sending a keepalive but before we attempt
to respond.

Fixing this on the receive side alone seems quite messy and fragile.
So instead, I think we should let the master send a shutdown message
after it has sent everything it wants to send, and wait for the client
to acknowledge it before shutting down the socket.

If the client fails to respond, we could log a fat WARNING.

best regards,
Florian Pflug

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Inconsistent DB data in Streaming Replication

2013-04-17 Thread Amit Kapila

On Monday, April 15, 2013 1:02 PM Florian Pflug wrote:
> On Apr14, 2013, at 17:56 , Fujii Masao  wrote:
> > At fast shutdown, after walsender sends the checkpoint record and
> > closes the replication connection, walreceiver can detect the close
> > of connection before receiving all WAL records. This means that,
> > even if walsender sends all WAL records, walreceiver cannot always
> > receive all of them.
> 
> That sounds like a bug in walreceiver to me.
> 
> The following code in walreceiver's main loop looks suspicious:
> 
>   /*
>* Process the received data, and any subsequent data we
>* can read without blocking.
>*/
>   for (;;)
>   {
> if (len > 0)
> {
>   /* Something was received from master, so reset timeout */
>   ...
>   XLogWalRcvProcessMsg(buf[0], &buf[1], len - 1);
> }
> else if (len == 0)
>   break;
> else if (len < 0)
> {
>   ereport(LOG,
>   (errmsg("replication terminated by primary server"),
>errdetail("End of WAL reached on timeline %u at %X/%X",
>  startpointTLI,
>  (uint32) (LogstreamResult.Write >> 32),
>  (uint32) LogstreamResult.Write)));
>   ...
> }
> len = walrcv_receive(0, &buf);
>   }
> 
>   /* Let the master know that we received some data. */
>   XLogWalRcvSendReply(false, false);
> 
>   /*
>* If we've written some records, flush them to disk and
>* let the startup process and primary server know about
>* them.
>*/
>   XLogWalRcvFlush(false);
> 
> The loop at the top looks fine - it specifically avoids throwing
> an error on EOF. But the code then proceeds to XLogWalRcvSendReply()
> which doesn't seem to have the same smarts - it simply does
> 
>   if (PQputCopyData(streamConn, buffer, nbytes) <= 0 ||
>   PQflush(streamConn))
>   ereport(ERROR,
>   (errmsg("could not send data to WAL stream: %s",
>   PQerrorMessage(streamConn;
> 
> Unless I'm missing something, that certainly seems to explain
> how a standby can lag behind even after a controlled shutdown of
> the master.

Do you mean to say that as an error has occurred, so it would not be able to
flush received WAL, which could result in loss of WAL?
I think even if error occurs, it will call flush in WalRcvDie(), before
terminating WALReceiver.

With Regards,
Amit Kapila.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] event trigger API documentation?

2013-04-17 Thread Dimitri Fontaine

Peter Eisentraut  writes:
> I'm specifically looking for C API documentation, along the lines of
> http://www.postgresql.org/docs/devel/static/trigger-interface.html.
>
> The current chapter on event triggers might as well be ripped out and
> folded into the CREATE EVENT TRIGGER reference page, because it explains
> nothing about programming those triggers.

I'm not sure about ripping it out, it does not sound like a good idea to
me. It needs some addition and C level examples yes. The plan was to
build a contrib module as an example, that would cancel any (supported)
command you try to run by means of ereport(ERROR, …);. Then add that in
pieces in the docs with details about what's going on.

While the commit fest was still running didn't look like the right time
to work on that. Beta looks like when to be working on that.

What do you think about the proposal here?

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

52 matches

Mail list logo