On 04/03/2015 05:28 AM, Abhijit Menon-Sen wrote:
At 2015-04-03 00:33:10 +0300, hlinn...@iki.fi wrote:
I came up with the attached.
I like it very much.
src/port/Makefile has (note src/srv):
+# pg_crc32c_sse42.o and its _src.o version need CFLAGS_SSE42
+pg_crc32c_sse42.o:
At 2015-03-25 19:18:51 +0200, hlinn...@iki.fi wrote:
I think we'll need a version check there. […]
You want to write that or should I?
I'm not familiar with MSVC at all, so it would be nice if you did it.
How do you like this latest version of the patch otherwise?
I'm sorry, but I'm still
On Thu, Apr 2, 2015 at 5:33 PM, Heikki Linnakangas hlinn...@iki.fi wrote:
It was added in gcc 4.2. That's good enough for me.
I think it's fine to have optional optimizations that require gcc =
4.2, as long as older platforms don't break outright.
We have a buildfarm animal that still uses gcc
At 2015-04-03 00:33:10 +0300, hlinn...@iki.fi wrote:
I came up with the attached.
I like it very much.
src/port/Makefile has (note src/srv):
+# pg_crc32c_sse42.o and its _src.o version need CFLAGS_SSE42
+pg_crc32c_sse42.o: CFLAGS+=$(CFLAGS_SSE42)
+pg_crc32c_sse42_srv.o:
On 04/02/2015 12:39 PM, Abhijit Menon-Sen wrote:
At 2015-03-25 19:18:51 +0200, hlinn...@iki.fi wrote:
I think we'll need a version check there. […]
You want to write that or should I?
I'm not familiar with MSVC at all, so it would be nice if you did it.
Thinking more about the configure
On 2015-04-02 20:57:24 +0530, Abhijit Menon-Sen wrote:
At 2015-04-02 17:58:23 +0300, hlinn...@iki.fi wrote:
We're only using inline assembly to force producing SSE 4.2 code, even
when -msse4.2 is not used. That feels wrong.
Why? It feels OK to me (and to Andres, per earlier discussions
At 2015-04-02 17:58:23 +0300, hlinn...@iki.fi wrote:
We're only using inline assembly to force producing SSE 4.2 code, even
when -msse4.2 is not used. That feels wrong.
Why? It feels OK to me (and to Andres, per earlier discussions about
exactly this topic). Doing it this way allows the binary
On 02/12/2015 09:26 PM, Heikki Linnakangas wrote:
On 02/11/2015 04:20 PM, Abhijit Menon-Sen wrote:
At 2015-02-11 13:20:29 +0200, hlinnakan...@vmware.com wrote:
I don't follow. I didn't change configure at all, compared to your
patch.
OK, I extrapolated a little too much. Your patch didn't
On 03/25/2015 07:20 PM, Andres Freund wrote:
On 2015-03-25 19:18:51 +0200, Heikki Linnakangas wrote:
Or better yet, a direct configure test to check if the
intrinsic exists - that way we get to also use it on Intel compilers, which
I believe also has the same intrinsics.
Maybe I'm missing
On 2015-03-25 19:18:51 +0200, Heikki Linnakangas wrote:
I was just about to commit the attached, which is the same as the previous
patch with just cosmetic comment changes, but then I realized that this
probably doesn't compile with Visual Studio 2005 or older. The code does
#ifdef _MSC_VER,
On 02/11/2015 04:20 PM, Abhijit Menon-Sen wrote:
At 2015-02-11 13:20:29 +0200, hlinnakan...@vmware.com wrote:
I don't follow. I didn't change configure at all, compared to your
patch.
OK, I extrapolated a little too much. Your patch didn't actually include
crc_instructions.h;
Oh, I'm
At 2015-02-10 14:30:51 +0200, hlinnakan...@vmware.com wrote:
I propose that we add a new header file in src/port, let's call it
crc_instructions.h […] the point of the crc_instructions.h header
is to hide the differences between compilers and architectures.
Moving the assembly code/compiler
On 02/11/2015 11:28 AM, Abhijit Menon-Sen wrote:
1. I don't mind moving platform-specific tests for CRC32C instructions
to configure, but if we only define PG_HAVE_CRC32C_INSTRUCTIONS, we
would anyway have to reproduce all that platform-specific stuff in
the header file. To do it
On 02/09/2015 03:20 PM, Abhijit Menon-Sen wrote:
At 2015-02-09 12:52:41 +0200, hlinnakan...@vmware.com wrote:
Do you have access to big-endian hardware to test this on?
Yes, I tested it on a Linux/ppc system. I wasn't speculating when I said
it does make a noticeable difference, though I'm
On 01/09/2015 10:32 AM, Abhijit Menon-Sen wrote:
2. The sse4.2 patch has only some minor compile fixes.
I have built and tested both patches individually on little-endian
(amd64) and big-endian (ppc) systems. I verified that the _sse is
chosen at startup on the former, and _sb8 on the latter,
On 02/08/2015 08:33 PM, Abhijit Menon-Sen wrote:
At 2015-02-08 18:46:30 +0200, hlinnakan...@vmware.com wrote:
So I propose to move pg_crc.c to src/common, and move the tables
from pg_crc_tables.h directly to pg_crc.c. Thoughts?
Sounds fine to me.
Ok, I've committed a patch that just moves
At 2015-02-09 12:52:41 +0200, hlinnakan...@vmware.com wrote:
Ok, I've committed a patch that just moves the existing code to
common/pg_crc.c […]
Thanks, looks good.
Attached is a rebased version of the slicing-by-8 patch.
Looks OK too.
Do you have access to big-endian hardware to test
At 2015-02-08 18:46:30 +0200, hlinnakan...@vmware.com wrote:
So I propose to move pg_crc.c to src/common, and move the tables
from pg_crc_tables.h directly to pg_crc.c. Thoughts?
Sounds fine to me.
-- Abhijit
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make
On 01/09/2015 10:32 AM, Abhijit Menon-Sen wrote:
1. The slicing-by-8 patch contains numerous changes:
With this patch, CALC_CRC32C is no longer a pure macro, but includes a
function call. That means that you can no longer just #include pg_crc.h
and pg_crc_tables.h in an external program. We
On 2015-02-08 18:46:30 +0200, Heikki Linnakangas wrote:
So I propose to move pg_crc.c to src/common, and move the tables from
pg_crc_tables.h directly to pg_crc.c. Thoughts?
+1.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL
Hi Heikki.
I've attached two regenerated CRC patches, split up as before.
1. The slicing-by-8 patch contains numerous changes:
a. A configure test for __builtin_bswap32
b. A comment referencing the slicing-by-8 paper (which is behind a
paywall, unfortunately, so I haven't even
On 01/01/2015 09:17 AM, Abhijit Menon-Sen wrote:
Hi.
OK, here are the patches with the various suggestions applied.
I found that the alignment didn't seem to make much difference for the
CRC32* instructions, so I changed to process (len/8)*8bytes followed by
(len%8)*1bytes, the way the Linux
Hi.
OK, here are the patches with the various suggestions applied.
I found that the alignment didn't seem to make much difference for the
CRC32* instructions, so I changed to process (len/8)*8bytes followed by
(len%8)*1bytes, the way the Linux kernel does.
-- Abhijit
From
On 12/30/2014 09:40 AM, Abhijit Menon-Sen wrote:
Hi.
I'm re-attaching the two patches as produced by format-patch. I haven't
listed any reviewers. It's either just Andres, or maybe a lot of people.
Is anyone in a position to try out the patches on MSVC and see if they
build and work sensibly,
At 2014-12-30 16:05:50 +0200, hlinnakan...@vmware.com wrote:
A couple of quick comments:
bswap32 is unused on on little-endian systems. That will give a
compiler warning.
Huh. I don't get a warning, even when I add -Wunused to the build flags.
But since you mention it, it would be better to
On 2014-12-30 21:36:19 +0530, Abhijit Menon-Sen wrote:
Thanks for spotting that. I had meant to change the test to 7. But
again, now that you mention it, I'm not sure it's necessary. The CRC32*
instructions don't have the usual SSE alignment requirements, and I see
that the Linux kernel
Hi,
On 2014-12-25 11:57:29 +0530, Abhijit Menon-Sen wrote:
-extern pg_crc32 pg_comp_crc32c(pg_crc32 crc, const void *data, size_t len);
+extern void pg_init_comp_crc32c(void);
How about pg_choose_crc_impl() or something?
+extern pg_crc32 (*pg_comp_crc32c)(pg_crc32 crc, const void *data,
At 2014-12-29 13:22:28 +0100, and...@2ndquadrant.com wrote:
How about pg_choose_crc_impl() or something?
Done.
_sb8? Unless I miss something it's not slice by 8 but rather bytewise?
This is meant to apply on top of the earlier patches I posted to
implement slice-by-8. I'll attach both here.
Hi.
I'm re-attaching the two patches as produced by format-patch. I haven't
listed any reviewers. It's either just Andres, or maybe a lot of people.
Is anyone in a position to try out the patches on MSVC and see if they
build and work sensibly, please? (Otherwise it may be better to remove
those
On Thu, Dec 25, 2014 at 11:57:29AM +0530, Abhijit Menon-Sen wrote:
Hi.
Here's a proposed patch to use CPUID at startup to determine if the
SSE4.2 CRC instructions are available, to use them instead of the
slice-by-8 implementation (posted earlier).
A few notes:
1. GCC has included
On December 26, 2014 4:50:33 PM CET, Bruce Momjian br...@momjian.us wrote:
On Thu, Dec 25, 2014 at 11:57:29AM +0530, Abhijit Menon-Sen wrote:
Hi.
Here's a proposed patch to use CPUID at startup to determine if the
SSE4.2 CRC instructions are available, to use them instead of the
slice-by-8
On Fri, Dec 26, 2014 at 04:52:58PM +0100, Andres Freund wrote:
On December 26, 2014 4:50:33 PM CET, Bruce Momjian br...@momjian.us wrote:
On Thu, Dec 25, 2014 at 11:57:29AM +0530, Abhijit Menon-Sen wrote:
Hi.
Here's a proposed patch to use CPUID at startup to determine if the
SSE4.2 CRC
On December 26, 2014 6:05:34 PM CET, Bruce Momjian br...@momjian.us wrote:
On Fri, Dec 26, 2014 at 04:52:58PM +0100, Andres Freund wrote:
On December 26, 2014 4:50:33 PM CET, Bruce Momjian br...@momjian.us
wrote:
On Thu, Dec 25, 2014 at 11:57:29AM +0530, Abhijit Menon-Sen wrote:
Hi.
On Fri, Dec 26, 2014 at 04:52:58PM +0100, Andres Freund wrote:
Uh, what if the system is compiled on a different CPU that it
is
run on? Seems we would need a run-time CPU test.
That's the cpuid thing mentioned above.
Is this something that could potentially change the data stored on disk?
At 2014-12-26 13:11:43 -0500, br...@momjian.us wrote:
Is this something that could potentially change the data stored on
disk? Does pg_upgrade need to check for changes in this area? Is the
detection exposed by pg_controldata? Could this affect running the
data directory on a different CPU?
On Fri, Dec 26, 2014 at 11:52:41PM +0530, Abhijit Menon-Sen wrote:
At 2014-12-26 13:11:43 -0500, br...@momjian.us wrote:
Is this something that could potentially change the data stored on
disk? Does pg_upgrade need to check for changes in this area? Is the
detection exposed by
At 2014-12-26 13:48:40 -0500, br...@momjian.us wrote:
I assume you are only linking into Heikki's new code and will not
change the places that use the old CRC method on disk --- just
checking.
Correct. The legacy CRC computation code used by ltree etc. is
completely unaffected by both my sets
Hi.
Here's a proposed patch to use CPUID at startup to determine if the
SSE4.2 CRC instructions are available, to use them instead of the
slice-by-8 implementation (posted earlier).
A few notes:
1. GCC has included cpuid.h since 4.3.0, so I figured it was safe to
use. It can be replaced with
On Thu, Nov 27, 2014 at 11:19:51AM +0530, Abhijit Menon-Sen wrote:
At 2014-11-26 18:56:52 -0500, br...@momjian.us wrote:
I would test it with fsync=off to remove the fsync delay. That will
simulate an SSD or BBU controller.
The earlier tests were run with fsync=off.
I ran another
On Fri, Nov 21, 2014 at 07:49:45AM -0600, k...@rice.edu wrote:
Hi,
This indicates that another part of the system is the resource limit,
not the CRC calculation. My money is on the I/O system. Try it using
an in memory filesystem and see if that matters. Even if it is still
the same
At 2014-11-26 18:56:52 -0500, br...@momjian.us wrote:
I would test it with fsync=off to remove the fsync delay. That will
simulate an SSD or BBU controller.
The earlier tests were run with fsync=off.
I ran another round of the replay tests on the i7 server, this time
replaying 30GB of WAL
At 2014-11-20 13:47:00 +0530, a...@2ndquadrant.com wrote:
Suggestions for how to address (b) are welcome.
With help from Andres, I set up a workload where XLogInsert* was at the
top of my profiles: server with fsync and synchronous_commit off, and
pgbench running a multiple-row insert into a
On 11/21/2014 12:11 PM, Abhijit Menon-Sen wrote:
At 2014-11-20 13:47:00 +0530, a...@2ndquadrant.com wrote:
Suggestions for how to address (b) are welcome.
With help from Andres, I set up a workload where XLogInsert* was at the
top of my profiles: server with fsync and synchronous_commit
On 2014-11-21 13:01:50 +0200, Heikki Linnakangas wrote:
On 11/21/2014 12:11 PM, Abhijit Menon-Sen wrote:
At 2014-11-20 13:47:00 +0530, a...@2ndquadrant.com wrote:
Suggestions for how to address (b) are welcome.
With help from Andres, I set up a workload where XLogInsert* was at the
top of
On 11/21/2014 01:06 PM, Andres Freund wrote:
On 2014-11-21 13:01:50 +0200, Heikki Linnakangas wrote:
On 11/21/2014 12:11 PM, Abhijit Menon-Sen wrote:
At 2014-11-20 13:47:00 +0530, a...@2ndquadrant.com wrote:
Suggestions for how to address (b) are welcome.
With help from Andres, I set up a
On Fri, Nov 21, 2014 at 12:11 PM, Abhijit Menon-Sen a...@2ndquadrant.com
wrote:
If anyone has other suggestions, I'm all ears.
Do you have a WIP patch I could take a look at and tweak? Maybe
there's something about the compilers code generation that could be
improved.
Regards,
Ants Aasma
--
At 2014-11-20 09:52:01 +0530, a...@2ndquadrant.com wrote:
To address (a), I am timing a standby restoring the same 11GB of WAL
via restore_command with and without the CRC patch.
I ran perf record -F 100 --call-graph=dwarf bin/postgres -D backup,
where:
(a) bin/postgres was compiled, as
At 2014-11-11 16:56:00 +0530, a...@2ndquadrant.com wrote:
I'm working on this (first speeding up the default calculation using
slice-by-N, then adding support for the SSE4.2 CRC instruction on
top).
I've done the first part in the attached patch, and I'm working on the
second (especially the
On 11/19/2014 05:58 PM, Abhijit Menon-Sen wrote:
At 2014-11-11 16:56:00 +0530, a...@2ndquadrant.com wrote:
I'm working on this (first speeding up the default calculation using
slice-by-N, then adding support for the SSE4.2 CRC instruction on
top).
I've done the first part in the attached
Robert Haas robertmh...@gmail.com writes:
On Wed, Nov 19, 2014 at 11:44 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
That's an interesting choice of workload. That sure is heavy on the CRC
calculation, but the speed of pg_xlogdump hardly matters in real life.
But isn't a workload
On 11/19/2014 06:49 PM, Robert Haas wrote:
On Wed, Nov 19, 2014 at 11:44 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
That's an interesting choice of workload. That sure is heavy on the CRC
calculation, but the speed of pg_xlogdump hardly matters in real life.
But isn't a workload
On 2014-11-19 19:12:22 +0200, Heikki Linnakangas wrote:
On 11/19/2014 06:49 PM, Robert Haas wrote:
On Wed, Nov 19, 2014 at 11:44 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
That's an interesting choice of workload. That sure is heavy on the CRC
calculation, but the speed of
At 2014-11-19 19:12:22 +0200, hlinnakan...@vmware.com wrote:
But pg_xlogdump's way of using the CRC isn't necessarily
representative of how the backend uses it. It's probably pretty close
to WAL replay in the server, but even there the server might be hurt
more by the extra cache used by the
At 2014-11-04 14:40:36 +0100, and...@2ndquadrant.com wrote:
On 2014-11-04 08:21:13 -0500, Robert Haas wrote:
Are you going to get the slice-by-N stuff working next, to speed
this up?
I don't plan to do anything serious with it, but I've hacked up the
crc code to use the hardware
On Tue, Nov 11, 2014 at 6:26 AM, Abhijit Menon-Sen a...@2ndquadrant.com wrote:
At 2014-11-04 14:40:36 +0100, and...@2ndquadrant.com wrote:
On 2014-11-04 08:21:13 -0500, Robert Haas wrote:
Are you going to get the slice-by-N stuff working next, to speed
this up?
I don't plan to do anything
On 11/04/2014 03:21 PM, Robert Haas wrote:
On Tue, Nov 4, 2014 at 4:47 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
I hear none, so committed with some small fixes.
Are you going to get the slice-by-N stuff working next, to speed this up?
I don't have any concrete plans, but yeah,
On Tue, Nov 4, 2014 at 3:17 PM, Heikki Linnakangas hlinnakan...@vmware.com
wrote:
On 10/27/2014 06:02 PM, Heikki Linnakangas wrote:
I came up with the attached patches. They do three things:
1. Get rid of the 64-bit CRC code. It's not used for anything, and
haven't been for years, so it
On 11/07/2014 07:08 AM, Amit Kapila wrote:
On Tue, Nov 4, 2014 at 3:17 PM, Heikki Linnakangas hlinnakan...@vmware.com
wrote:
On 10/27/2014 06:02 PM, Heikki Linnakangas wrote:
I came up with the attached patches. They do three things:
1. Get rid of the 64-bit CRC code. It's not used for
On 10/27/2014 06:02 PM, Heikki Linnakangas wrote:
I came up with the attached patches. They do three things:
1. Get rid of the 64-bit CRC code. It's not used for anything, and
haven't been for years, so it doesn't seem worth spending any effort to
fix them.
2. Switch to CRC-32C (Castagnoli)
On Tue, Nov 4, 2014 at 4:47 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
I hear none, so committed with some small fixes.
Are you going to get the slice-by-N stuff working next, to speed this up?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL
On 2014-11-04 08:21:13 -0500, Robert Haas wrote:
On Tue, Nov 4, 2014 at 4:47 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
I hear none, so committed with some small fixes.
Are you going to get the slice-by-N stuff working next, to speed this up?
I don't plan to do anything serious
On 10/09/2014 12:13 AM, Andres Freund wrote:
On 2014-10-08 22:13:46 +0300, Heikki Linnakangas wrote:
As far as I can tell, PostgreSQL's so-called CRC algorithm doesn't
correspond to any bit-by-bit CRC variant and polynomial. My math skills are
not strong enough to determine what the
On 10/09/2014 01:23 AM, Gavin Flower wrote:
On 09/10/14 10:13, Andres Freund wrote:
If we're switching to a saner computation, we should imo also switch to
a better polynom - CRC-32C has better error detection capabilities than
CRC32 and is available in hardware. As we're paying the price pf
On 2014-10-08 22:13:46 +0300, Heikki Linnakangas wrote:
Hmm. So the simple, non-table driven, calculation gives the same result as
using the lookup table using the reflected lookup code. That's expected; the
lookup method is supposed to be the same, just faster. However, using the
normal
On 09/10/14 10:13, Andres Freund wrote:
On 2014-10-08 22:13:46 +0300, Heikki Linnakangas wrote:
Hmm. So the simple, non-table driven, calculation gives the same result as
using the lookup table using the reflected lookup code. That's expected; the
lookup method is supposed to be the same, just
65 matches
Mail list logo