Re: CRC algorithm (was Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes)

2014-10-06 Thread Robert Haas
On Tue, Sep 16, 2014 at 6:49 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 As it happens, I also wrote an implementation of Slice-by-4 the other
 day

 If Heikki's version works I see little need to use my/Abhijit's
 patch. That version has part of it under the zlib license. If Heikki's
 version is a 'clean room', then I'd say we go with it. It looks really
 quite similar though... We can make minor changes like additional
 unrolling without problems lateron.


 I used http://create.stephan-brumme.com/crc32/#slicing-by-8-overview as
 reference - you can probably see the similarity. Any implementation is going
 to look more or less the same, though; there aren't that many ways to write
 the implementation.

So, it seems like the status of this patch is:

1. It probably has a bug, since Amit's testing seemed to show that it
wasn't returning the same results as unpatched master.

2. The performance tests showed a significant win on an important workload.

3. It's not in any CommitFest anywhere.

Given point #2, it's seems like we ought to find a way to keep this
from sliding into oblivion.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: CRC algorithm (was Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes)

2014-09-16 Thread Amit Kapila
On Sat, Sep 13, 2014 at 1:33 AM, Heikki Linnakangas hlinnakan...@vmware.com
wrote:
 On 09/12/2014 10:54 PM, Abhijit Menon-Sen wrote:
 At 2014-09-12 22:38:01 +0300, hlinnakan...@vmware.com wrote:
 We probably should consider switching to a faster CRC algorithm again,
 regardless of what we do with compression.

 As it happens, I'm already working on resurrecting a patch that Andres
 posted in 2010 to switch to zlib's faster CRC implementation.

 As it happens, I also wrote an implementation of Slice-by-4 the other day
:-).
 Haven't gotten around to post it, but here it is.

Incase we are using the implementation for everything that uses
COMP_CRC32() macro, won't it give problem for older version
databases.  I have created a database with Head code and then
tried to start server after applying this patch it gives below error:
FATAL:  incorrect checksum in control file

In general, the idea sounds quite promising.  To see how it performs
on small to medium size data, I have used attached test which is
written be you (with some additional tests) during performance test
of WAL reduction patch in 9.4.

Performance Data
--
Non-default settings

autovacuum = off
checkpoint_segments = 256
checkpoint_timeout = 20 min

HEAD -
testname | wal_generated | duration
-+---+--
 two short fields, no change | 583802008 | 11.6727559566498
 two short fields, no change | 580888024 | 11.8558299541473
 two short fields, no change | 580889680 | 11.5449349880219
 two short fields, one changed   | 620646400 | 11.6657111644745
 two short fields, one changed   | 620667904 | 11.6010649204254
 two short fields, one changed   | 622079320 | 11.6774570941925
 two short fields, both changed  | 620649656 | 12.0892491340637
 two short fields, both changed  | 620648360 | 12.1650269031525
 two short fields, both changed  | 620653952 | 12.2125108242035
 one short and one long field, no change | 329018192 | 4.74178600311279
 one short and one long field, no change | 329021664 | 4.71507883071899
 one short and one long field, no change | 330326496 | 4.84932994842529
 ten tiny fields, all changed| 701358488 |  14.236780166626
 ten tiny fields, all changed| 701355328 | 14.0777900218964
 ten tiny fields, all changed| 701358272 | 14.1000919342041
 hundred tiny fields, all changed| 315656568 | 6.99316620826721
 hundred tiny fields, all changed| 314875488 | 6.85715913772583
 hundred tiny fields, all changed| 315263768 | 6.94613790512085
 hundred tiny fields, half changed   | 314878360 | 6.89090895652771
 hundred tiny fields, half changed   | 314877216 | 7.05924606323242
 hundred tiny fields, half changed   | 314881816 | 6.93445992469788
 hundred tiny fields, half nulled| 236244136 | 6.43347096443176
 hundred tiny fields, half nulled| 236248104 | 6.30539107322693
 hundred tiny fields, half nulled| 236501040 | 6.33403086662292
 9 short and 1 long, short changed   | 262373616 | 4.24646091461182
 9 short and 1 long, short changed   | 262375136 | 4.49821400642395
 9 short and 1 long, short changed   | 262379840 | 4.38264393806458
(27 rows)


Patched -
testname | wal_generated | duration
-+---+--
 two short fields, no change | 580897400 | 10.6518769264221
 two short fields, no change | 581779816 | 10.7118690013885
 two short fields, no change | 581013224 | 10.8294110298157
 two short fields, one changed   | 620646264 | 10.8309078216553
 two short fields, one changed   | 620652872 | 10.8480410575867
 two short fields, one changed   | 620812376 | 10.9162290096283
 two short fields, both changed  | 620651792 | 10.9025599956512
 two short fields, both changed  | 620652304 | 10.7771129608154
 two short fields, both changed  | 620649960 | 11.0185468196869
 one short and one long field, no change | 329022000 | 3.88278198242188
 one short and one long field, no change | 329023656 | 4.01899003982544
 one short and one long field, no change | 329022992 | 3.91587209701538
 ten tiny fields, all changed| 701353296 | 12.7748699188232
 ten tiny fields, all changed| 701354848 |  12.761589050293
 ten tiny fields, all changed| 701356520 | 12.6703131198883
 hundred tiny fields, all changed| 314879424 | 6.25606894493103
 hundred tiny fields, all changed| 314878416 | 6.32905578613281
 hundred tiny fields, all changed| 314878464 | 

Re: CRC algorithm (was Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes)

2014-09-16 Thread Andres Freund
On 2014-09-16 15:43:06 +0530, Amit Kapila wrote:
 On Sat, Sep 13, 2014 at 1:33 AM, Heikki Linnakangas hlinnakan...@vmware.com
 wrote:
  On 09/12/2014 10:54 PM, Abhijit Menon-Sen wrote:
  At 2014-09-12 22:38:01 +0300, hlinnakan...@vmware.com wrote:
  We probably should consider switching to a faster CRC algorithm again,
  regardless of what we do with compression.
 
  As it happens, I'm already working on resurrecting a patch that Andres
  posted in 2010 to switch to zlib's faster CRC implementation.
 
  As it happens, I also wrote an implementation of Slice-by-4 the other day
 :-).
  Haven't gotten around to post it, but here it is.
 
 Incase we are using the implementation for everything that uses
 COMP_CRC32() macro, won't it give problem for older version
 databases.  I have created a database with Head code and then
 tried to start server after applying this patch it gives below error:
 FATAL:  incorrect checksum in control file

That's indicative of a bug. This really shouldn't cause such problems -
at least my version was compatible with the current definition, and IIRC
Heikki's should be the same in theory. If I read it right.

 In general, the idea sounds quite promising.  To see how it performs
 on small to medium size data, I have used attached test which is
 written be you (with some additional tests) during performance test
 of WAL reduction patch in 9.4.

Yes, we should really do this.

 The patched version gives better results in all cases
 (in range of 10~15%), though this is not the perfect test, however
 it gives fair idea that the patch is quite promising.  I think to test
 the benefit from crc calculation for full page, we can have some
 checkpoint during each test (may be after insert).  Let me know
 what other kind of tests do you think are required to see the
 gain/loss from this patch.

I actually think we don't really need this. It's pretty evident that
slice-by-4 is a clear improvement.

 I think the main difference in this patch and what Andres has
 developed sometime back was code for manually unrolled loop
 doing 32bytes at once, so once Andres or Abhijit will post an
 updated version, we can do some performance tests to see
 if there is any additional gain.

If Heikki's version works I see little need to use my/Abhijit's
patch. That version has part of it under the zlib license. If Heikki's
version is a 'clean room', then I'd say we go with it. It looks really
quite similar though... We can make minor changes like additional
unrolling without problems lateron.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: CRC algorithm (was Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes)

2014-09-16 Thread Heikki Linnakangas

On 09/16/2014 01:28 PM, Andres Freund wrote:

On 2014-09-16 15:43:06 +0530, Amit Kapila wrote:

On Sat, Sep 13, 2014 at 1:33 AM, Heikki Linnakangas hlinnakan...@vmware.com
wrote:

On 09/12/2014 10:54 PM, Abhijit Menon-Sen wrote:

At 2014-09-12 22:38:01 +0300, hlinnakan...@vmware.com wrote:

We probably should consider switching to a faster CRC algorithm again,
regardless of what we do with compression.


As it happens, I'm already working on resurrecting a patch that Andres
posted in 2010 to switch to zlib's faster CRC implementation.


As it happens, I also wrote an implementation of Slice-by-4 the other day

:-).

Haven't gotten around to post it, but here it is.


Incase we are using the implementation for everything that uses
COMP_CRC32() macro, won't it give problem for older version
databases.  I have created a database with Head code and then
tried to start server after applying this patch it gives below error:
FATAL:  incorrect checksum in control file


That's indicative of a bug. This really shouldn't cause such problems -
at least my version was compatible with the current definition, and IIRC
Heikki's should be the same in theory. If I read it right.


In general, the idea sounds quite promising.  To see how it performs
on small to medium size data, I have used attached test which is
written be you (with some additional tests) during performance test
of WAL reduction patch in 9.4.


Yes, we should really do this.


The patched version gives better results in all cases
(in range of 10~15%), though this is not the perfect test, however
it gives fair idea that the patch is quite promising.  I think to test
the benefit from crc calculation for full page, we can have some
checkpoint during each test (may be after insert).  Let me know
what other kind of tests do you think are required to see the
gain/loss from this patch.


I actually think we don't really need this. It's pretty evident that
slice-by-4 is a clear improvement.


I think the main difference in this patch and what Andres has
developed sometime back was code for manually unrolled loop
doing 32bytes at once, so once Andres or Abhijit will post an
updated version, we can do some performance tests to see
if there is any additional gain.


If Heikki's version works I see little need to use my/Abhijit's
patch. That version has part of it under the zlib license. If Heikki's
version is a 'clean room', then I'd say we go with it. It looks really
quite similar though... We can make minor changes like additional
unrolling without problems lateron.


I used http://create.stephan-brumme.com/crc32/#slicing-by-8-overview as 
reference - you can probably see the similarity. Any implementation is 
going to look more or less the same, though; there aren't that many ways 
to write the implementation.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: CRC algorithm (was Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes)

2014-09-16 Thread Andres Freund
On 2014-09-16 13:49:20 +0300, Heikki Linnakangas wrote:
 I used http://create.stephan-brumme.com/crc32/#slicing-by-8-overview as
 reference - you can probably see the similarity. Any implementation is going
 to look more or less the same, though; there aren't that many ways to write
 the implementation.

True.

I think I see what's the problem causing Amit's test to fail. Amit, did
you use the powerpc machine?

Heikki, you swap bytes unconditionally - afaics that's wrong on big
endian systems. My patch had:

+ static inline uint32 swab32(const uint32 x);
+ static inline uint32 swab32(const uint32 x){
+   return ((x  (uint32)0x00ffUL)  24) |
+   ((x  (uint32)0xff00UL)   8) |
+   ((x  (uint32)0x00ffUL)   8) |
+   ((x  (uint32)0xff00UL)  24);
+ }
+ 
+ #if defined __BIG_ENDIAN__
+ #define cpu_to_be32(x)
+ #else
+ #define cpu_to_be32(x) swab32(x)
+ #endif


I guess yours needs something similar. I personally like the cpu_to_be*
naming - it imo makes it pretty clear what happens.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: CRC algorithm (was Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes)

2014-09-16 Thread Amit Kapila
On Tue, Sep 16, 2014 at 4:27 PM, Andres Freund and...@2ndquadrant.com
wrote:
 On 2014-09-16 13:49:20 +0300, Heikki Linnakangas wrote:
  I used http://create.stephan-brumme.com/crc32/#slicing-by-8-overview as
  reference - you can probably see the similarity. Any implementation is
going
  to look more or less the same, though; there aren't that many ways to
write
  the implementation.

 True.

 I think I see what's the problem causing Amit's test to fail. Amit, did
 you use the powerpc machine?

Yes.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: CRC algorithm (was Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes)

2014-09-12 Thread Andres Freund
On 2014-09-12 23:03:00 +0300, Heikki Linnakangas wrote:
 On 09/12/2014 10:54 PM, Abhijit Menon-Sen wrote:
 At 2014-09-12 22:38:01 +0300, hlinnakan...@vmware.com wrote:
 
 We probably should consider switching to a faster CRC algorithm again,
 regardless of what we do with compression.
 
 As it happens, I'm already working on resurrecting a patch that Andres
 posted in 2010 to switch to zlib's faster CRC implementation.
 
 As it happens, I also wrote an implementation of Slice-by-4 the other day
 :-). Haven't gotten around to post it, but here it is.
 
 What algorithm does zlib use for CRC calculation?

Also slice-by-4, with a manually unrolled loop doing 32bytes at once, using
individual slice-by-4's. IIRC I tried and removing that slowed things
down overall. What it also did was move crc to a function. I'm not sure
why I did it that way, but it really might be beneficial - if you look
at profiles today there's sometimes icache/decoding stalls...

Hm. Let me look:
http://archives.postgresql.org/message-id/201005202227.49990.andres%40anarazel.de

Ick, there's quite some debugging leftovers ;)

I think it might be a good idea to also switch the polynom at the same
time. I really really think we should, when the hardware supports, use
the polynom that's available in SSE4.2. It has similar properties, can
implemented in software just the same...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: CRC algorithm (was Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes)

2014-09-12 Thread Arthur Silva
Em 12/09/2014 17:23, Andres Freund and...@2ndquadrant.com escreveu:

 On 2014-09-12 23:03:00 +0300, Heikki Linnakangas wrote:
  On 09/12/2014 10:54 PM, Abhijit Menon-Sen wrote:
  At 2014-09-12 22:38:01 +0300, hlinnakan...@vmware.com wrote:
  
  We probably should consider switching to a faster CRC algorithm again,
  regardless of what we do with compression.
  
  As it happens, I'm already working on resurrecting a patch that Andres
  posted in 2010 to switch to zlib's faster CRC implementation.
 
  As it happens, I also wrote an implementation of Slice-by-4 the other
day
  :-). Haven't gotten around to post it, but here it is.
 
  What algorithm does zlib use for CRC calculation?

 Also slice-by-4, with a manually unrolled loop doing 32bytes at once,
using
 individual slice-by-4's. IIRC I tried and removing that slowed things
 down overall. What it also did was move crc to a function. I'm not sure
 why I did it that way, but it really might be beneficial - if you look
 at profiles today there's sometimes icache/decoding stalls...

 Hm. Let me look:

http://archives.postgresql.org/message-id/201005202227.49990.andres%40anarazel.de

 Ick, there's quite some debugging leftovers ;)

 I think it might be a good idea to also switch the polynom at the same
 time. I really really think we should, when the hardware supports, use
 the polynom that's available in SSE4.2. It has similar properties, can
 implemented in software just the same...

 Greetings,

 Andres Freund

 --
  Andres Freund http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training  Services


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers

This Google library is worth a look https://code.google.com/p/crcutil/ as
it has some extremely optimized versions.