Re: [HACKERS] space reserved for WAL record does not match what was written: panic on windows

2013-10-10 Thread Robert Haas
On Tue, Oct 8, 2013 at 6:24 PM, Andres Freund and...@2ndquadrant.com wrote:
 Do you have a better alternative? Making the computation unconditionally
 64bit will have a runtime overhead and adding a StaticAssert in the
 existing macro doesn't work because we use it in array sizes where gcc
 balks.
 We could try using inline functions, but that's not going to be pretty
 either.

 I don't really see that many further usecases that will align 64bit
 values on 32bit platforms, so I think we're ok for now.

I'd be inclined to make the computation unconditionally 64-bit.  I
doubt the speed penalty is enough to worry about, and I think we're
going to have more and more cases where optimizing for 32-bit
platforms is just not the right decision.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] space reserved for WAL record does not match what was written: panic on windows

2013-10-10 Thread Andres Freund
On 2013-10-10 08:59:47 -0400, Robert Haas wrote:
 On Tue, Oct 8, 2013 at 6:24 PM, Andres Freund and...@2ndquadrant.com wrote:
  Do you have a better alternative? Making the computation unconditionally
  64bit will have a runtime overhead and adding a StaticAssert in the
  existing macro doesn't work because we use it in array sizes where gcc
  balks.
  We could try using inline functions, but that's not going to be pretty
  either.
 
  I don't really see that many further usecases that will align 64bit
  values on 32bit platforms, so I think we're ok for now.
 
 I'd be inclined to make the computation unconditionally 64-bit.  I
 doubt the speed penalty is enough to worry about, and I think we're
 going to have more and more cases where optimizing for 32-bit
 platforms is just not the right decision.

MAXALIGN is used in several of PG's hottest functions in many
scenarios. att_align_nominal is used in slot_deform_tuple,
heap_deform_tuple, nocachegetattr, etc. So I don't think that's viable
yet. At least not with much more benefit than this...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] space reserved for WAL record does not match what was written: panic on windows

2013-10-08 Thread Robert Haas
On Mon, Oct 7, 2013 at 4:47 PM, Andres Freund and...@2ndquadrant.com wrote:
 On 2013-10-07 13:25:17 -0400, Robert Haas wrote:
 On Fri, Oct 4, 2013 at 8:19 AM, Andres Freund and...@2ndquadrant.com wrote:
  Could it be that MAXALIGN/TYPEALIGN doesn't really work for values
  bigger than 32bit?
 
  #define MAXALIGN(LEN)   TYPEALIGN(MAXIMUM_ALIGNOF, (LEN))
  #define TYPEALIGN(ALIGNVAL,LEN)  \
  (((intptr_t) (LEN) + ((ALIGNVAL) - 1))  ~((intptr_t) ((ALIGNVAL) 
  - 1)))

 Isn't the problem, more specifically, that it doesn't work for values
 larger than an intptr_t?

 Well, yes. And intptr_t is 32bit wide on a 32bit platform.

 And does that indicate that intptr_t is the wrong type to be using here?

 No, I don't think so. intptr_t is defined to be a integer type to which
 you can cast a pointer, cast it back and still get the old value. On
 32bit platforms it usually will be 32bit wide.
 All that's fine for the classic usages of TYPEALIGN where it's used on
 pointers or lengths of stuff stored in memory. Those will always fit in
 32bit on a 32bit platform. But here we're using it on explicit 64bit
 types (XLogRecPtr).
 Now, you could argue that we should make it use 64bit math everywhere -
 but I think that might incur quite the price on some 32bit
 platforms. It's used in the tuple decoding stuff, that's quite the hot
 path in some workloads.

 So I guess it's either a separate macro, or we rewrite that piece of
 code to work slightly differently and work directly on the lenght or
 such.

 Maybe we should add a StaticAssert ensuring the TYPEALIGN macro only
 gets passed 32bit types?

I think having two macros that behave identically on all platforms
anyone cares about[1] but which can cause difficult-to-find
corner-case-bugs on other platforms is just a recipe for disaster.  I
pledge to screw that up at least once.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

[1] And by anyone, I mean me.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] space reserved for WAL record does not match what was written: panic on windows

2013-10-08 Thread Andres Freund
On 2013-10-08 12:26:17 -0400, Robert Haas wrote:
 On Mon, Oct 7, 2013 at 4:47 PM, Andres Freund and...@2ndquadrant.com wrote:
  So I guess it's either a separate macro, or we rewrite that piece of
  code to work slightly differently and work directly on the lenght or
  such.
 
  Maybe we should add a StaticAssert ensuring the TYPEALIGN macro only
  gets passed 32bit types?
 
 I think having two macros that behave identically on all platforms
 anyone cares about[1] but which can cause difficult-to-find
 corner-case-bugs on other platforms is just a recipe for disaster.  I
 pledge to screw that up at least once.

Do you have a better alternative? Making the computation unconditionally
64bit will have a runtime overhead and adding a StaticAssert in the
existing macro doesn't work because we use it in array sizes where gcc
balks.
We could try using inline functions, but that's not going to be pretty
either.

I don't really see that many further usecases that will align 64bit
values on 32bit platforms, so I think we're ok for now.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] space reserved for WAL record does not match what was written: panic on windows

2013-10-08 Thread David Rowley
On Wed, Oct 9, 2013 at 5:26 AM, Robert Haas robertmh...@gmail.com wrote:

 On Mon, Oct 7, 2013 at 4:47 PM, Andres Freund and...@2ndquadrant.com
 wrote:
  On 2013-10-07 13:25:17 -0400, Robert Haas wrote:
  On Fri, Oct 4, 2013 at 8:19 AM, Andres Freund and...@2ndquadrant.com
 wrote:
   Could it be that MAXALIGN/TYPEALIGN doesn't really work for values
   bigger than 32bit?
  
   #define MAXALIGN(LEN)   TYPEALIGN(MAXIMUM_ALIGNOF,
 (LEN))
   #define TYPEALIGN(ALIGNVAL,LEN)  \
   (((intptr_t) (LEN) + ((ALIGNVAL) - 1))  ~((intptr_t)
 ((ALIGNVAL) - 1)))
 
  Isn't the problem, more specifically, that it doesn't work for values
  larger than an intptr_t?
 
  Well, yes. And intptr_t is 32bit wide on a 32bit platform.
 
  And does that indicate that intptr_t is the wrong type to be using here?
 
  No, I don't think so. intptr_t is defined to be a integer type to which
  you can cast a pointer, cast it back and still get the old value. On
  32bit platforms it usually will be 32bit wide.
  All that's fine for the classic usages of TYPEALIGN where it's used on
  pointers or lengths of stuff stored in memory. Those will always fit in
  32bit on a 32bit platform. But here we're using it on explicit 64bit
  types (XLogRecPtr).
  Now, you could argue that we should make it use 64bit math everywhere -
  but I think that might incur quite the price on some 32bit
  platforms. It's used in the tuple decoding stuff, that's quite the hot
  path in some workloads.
 
  So I guess it's either a separate macro, or we rewrite that piece of
  code to work slightly differently and work directly on the lenght or
  such.
 
  Maybe we should add a StaticAssert ensuring the TYPEALIGN macro only
  gets passed 32bit types?

 I think having two macros that behave identically on all platforms
 anyone cares about[1] but which can cause difficult-to-find
 corner-case-bugs on other platforms is just a recipe for disaster.  I
 pledge to screw that up at least once.


The only improvement I thought of during writing the patch was to rename
MAXALIGN to something more related to RAM, like perhaps RAMMAXALIGN or
MEMORYMAXALIGN, so callers might think twice if they're using it for
anything apart from memory addresses. I didn't really come up with a name I
thought was good enough to warrant the change, so I left it as it was.

I also don't think it is perfect that both the marcos do the same thing on
a 64bit compilation, but I think I was in the same boat as you... couldn't
think of anything better.

If you can think of a name that will confuse users less then maybe it's
worth making the change.

David


 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company

 [1] And by anyone, I mean me.



Re: [HACKERS] space reserved for WAL record does not match what was written: panic on windows

2013-10-07 Thread Robert Haas
On Fri, Oct 4, 2013 at 8:19 AM, Andres Freund and...@2ndquadrant.com wrote:
 Could it be that MAXALIGN/TYPEALIGN doesn't really work for values
 bigger than 32bit?

 #define MAXALIGN(LEN)   TYPEALIGN(MAXIMUM_ALIGNOF, (LEN))
 #define TYPEALIGN(ALIGNVAL,LEN)  \
 (((intptr_t) (LEN) + ((ALIGNVAL) - 1))  ~((intptr_t) ((ALIGNVAL) - 
 1)))

Isn't the problem, more specifically, that it doesn't work for values
larger than an intptr_t?

And does that indicate that intptr_t is the wrong type to be using here?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] space reserved for WAL record does not match what was written: panic on windows

2013-10-07 Thread Andres Freund
On 2013-10-07 13:25:17 -0400, Robert Haas wrote:
 On Fri, Oct 4, 2013 at 8:19 AM, Andres Freund and...@2ndquadrant.com wrote:
  Could it be that MAXALIGN/TYPEALIGN doesn't really work for values
  bigger than 32bit?
 
  #define MAXALIGN(LEN)   TYPEALIGN(MAXIMUM_ALIGNOF, (LEN))
  #define TYPEALIGN(ALIGNVAL,LEN)  \
  (((intptr_t) (LEN) + ((ALIGNVAL) - 1))  ~((intptr_t) ((ALIGNVAL) - 
  1)))
 
 Isn't the problem, more specifically, that it doesn't work for values
 larger than an intptr_t?

Well, yes. And intptr_t is 32bit wide on a 32bit platform.

 And does that indicate that intptr_t is the wrong type to be using here?

No, I don't think so. intptr_t is defined to be a integer type to which
you can cast a pointer, cast it back and still get the old value. On
32bit platforms it usually will be 32bit wide.
All that's fine for the classic usages of TYPEALIGN where it's used on
pointers or lengths of stuff stored in memory. Those will always fit in
32bit on a 32bit platform. But here we're using it on explicit 64bit
types (XLogRecPtr).
Now, you could argue that we should make it use 64bit math everywhere -
but I think that might incur quite the price on some 32bit
platforms. It's used in the tuple decoding stuff, that's quite the hot
path in some workloads.

So I guess it's either a separate macro, or we rewrite that piece of
code to work slightly differently and work directly on the lenght or
such.

Maybe we should add a StaticAssert ensuring the TYPEALIGN macro only
gets passed 32bit types?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] space reserved for WAL record does not match what was written: panic on windows

2013-10-07 Thread Heikki Linnakangas

On 07.10.2013 23:47, Andres Freund wrote:

On 2013-10-07 13:25:17 -0400, Robert Haas wrote:

And does that indicate that intptr_t is the wrong type to be using here?


No, I don't think so. intptr_t is defined to be a integer type to which
you can cast a pointer, cast it back and still get the old value. On
32bit platforms it usually will be 32bit wide.
All that's fine for the classic usages of TYPEALIGN where it's used on
pointers or lengths of stuff stored in memory. Those will always fit in
32bit on a 32bit platform. But here we're using it on explicit 64bit
types (XLogRecPtr).


Yep.


Now, you could argue that we should make it use 64bit math everywhere -
but I think that might incur quite the price on some 32bit
platforms. It's used in the tuple decoding stuff, that's quite the hot
path in some workloads.

So I guess it's either a separate macro, or we rewrite that piece of
code to work slightly differently and work directly on the lenght or
such.


Committed what is pretty much David's original patch.


Maybe we should add a StaticAssert ensuring the TYPEALIGN macro only
gets passed 32bit types?


I tried that, like this:

--- a/src/include/c.h
+++ b/src/include/c.h
@@ -532,7 +532,7 @@ typedef NameData *Name;
  */

 #define TYPEALIGN(ALIGNVAL,LEN)  \
-   (((intptr_t) (LEN) + ((ALIGNVAL) - 1))  ~((intptr_t) ((ALIGNVAL) - 1)))
+	(	StaticAssertExpr(sizeof(LEN) = 4, type is too wide), ((intptr_t) 
(LEN) + ((ALIGNVAL) - 1))  ~((intptr_t) ((ALIGNVAL) - 1)))


 #define SHORTALIGN(LEN)TYPEALIGN(ALIGNOF_SHORT, (LEN))
 #define INTALIGN(LEN)  TYPEALIGN(ALIGNOF_INT, (LEN))

However, it gave a lot of errors from places where we have something 
like char buf[MaxHeapTuplesPerPage]. MaxHeapTuplesPerPage uses 
MAXALIGN, and gcc doesn't like it when StaticAssertExpr is used in such 
a context (to determine the size of an array).


I temporarily changed places where that was a problem to use a copy of 
TYPEALIGN with no assertion, to verify that there are no more genuine 
bugs like this lurking. It was worth it as a one-off check, but I don't 
think we want to commit that.


Thanks for the report, and analysis!

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] space reserved for WAL record does not match what was written: panic on windows

2013-10-04 Thread David Rowley
In HEAD of 9.4 I'm getting the following:

D:\9.4\binpostgres.exe -D d:\9.4\data
LOG:  database system was shut down at 2013-10-05 00:43:33 NZDT
LOG:  database system is ready to accept connections
LOG:  autovacuum launcher started
PANIC:  space reserved for WAL record does not match what was written:
CurrPos = 18446744071562067968 EndPos = 2147483648
STATEMENT:  insert into test (items) select x.x from
generate_series(1,1000) x(x);
LOG:  server process (PID 5476) exited with exit code 3
DETAIL:  Failed process was running: insert into test (items) select x.x
from generate_series(1,1000) x(x);
LOG:  terminating any other active server processes

debug_assertions = on

I made a slight change to the line that causes the panic to print out the
values of CurrPos and EndPos, as you can see above.

Only changes made to postgresql.conf are:

checkpoint_segments = 64 # in logfile segments, min 1, 16MB each
checkpoint_timeout = 15min # range 30s-1h

I discovered this was happening just after I bumped the checkpoint segment
up to 64 for bulk loading some test data.

create table test (
  id serial not null,
  name varchar(64) not null default 'name of something',
  price numeric(10,2) not null default 1000.00,
  active boolean not null default true,
  items int not null default 1,
  description text not null default 'description of item',
  primary key(id)
);

insert into test (items) select x.x from generate_series(1,1000) x(x);

I'm running this on windows 7 64bit with postgres compiled as 32bit.

Regards

David


Re: [HACKERS] space reserved for WAL record does not match what was written: panic on windows

2013-10-04 Thread Andres Freund
On 2013-10-05 01:05:37 +1300, David Rowley wrote:
 In HEAD of 9.4 I'm getting the following:
 
 D:\9.4\binpostgres.exe -D d:\9.4\data
 LOG:  database system was shut down at 2013-10-05 00:43:33 NZDT
 LOG:  database system is ready to accept connections
 LOG:  autovacuum launcher started
 PANIC:  space reserved for WAL record does not match what was written:
 CurrPos = 18446744071562067968 EndPos = 2147483648

Could it be that MAXALIGN/TYPEALIGN doesn't really work for values
bigger than 32bit?

#define MAXALIGN(LEN)   TYPEALIGN(MAXIMUM_ALIGNOF, (LEN))
#define TYPEALIGN(ALIGNVAL,LEN)  \
(((intptr_t) (LEN) + ((ALIGNVAL) - 1))  ~((intptr_t) ((ALIGNVAL) - 1)))

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] space reserved for WAL record does not match what was written: panic on windows

2013-10-04 Thread David Rowley
On Sat, Oct 5, 2013 at 1:19 AM, Andres Freund and...@2ndquadrant.comwrote:

 On 2013-10-05 01:05:37 +1300, David Rowley wrote:
  In HEAD of 9.4 I'm getting the following:
 
  D:\9.4\binpostgres.exe -D d:\9.4\data
  LOG:  database system was shut down at 2013-10-05 00:43:33 NZDT
  LOG:  database system is ready to accept connections
  LOG:  autovacuum launcher started
  PANIC:  space reserved for WAL record does not match what was written:
  CurrPos = 18446744071562067968 EndPos = 2147483648

 Could it be that MAXALIGN/TYPEALIGN doesn't really work for values
 bigger than 32bit?

 #define MAXALIGN(LEN)   TYPEALIGN(MAXIMUM_ALIGNOF, (LEN))
 #define TYPEALIGN(ALIGNVAL,LEN)  \
 (((intptr_t) (LEN) + ((ALIGNVAL) - 1))  ~((intptr_t) ((ALIGNVAL)
 - 1)))


It looks that way.

As a quick test I put some printf's around where the MAXALIGN is used:

/* Align the end position, so that the next record starts aligned */
printf(CurrPos == %llu (before MAXALIGN)\n, CurrPos);
CurrPos = MAXALIGN(CurrPos);
printf(CurrPos == %llu (after MAXALIGN)\n, CurrPos);

I got the following just before the PANIC.

CurrPos == 2147483711 (before MAXALIGN)
CurrPos == 18446744071562068032 (after MAXALIGN)

Regards

David


Re: [HACKERS] space reserved for WAL record does not match what was written: panic on windows

2013-10-04 Thread David Rowley
On Sat, Oct 5, 2013 at 1:34 AM, David Rowley dgrowle...@gmail.com wrote:

 On Sat, Oct 5, 2013 at 1:19 AM, Andres Freund and...@2ndquadrant.comwrote:

 On 2013-10-05 01:05:37 +1300, David Rowley wrote:
  In HEAD of 9.4 I'm getting the following:
 
  D:\9.4\binpostgres.exe -D d:\9.4\data
  LOG:  database system was shut down at 2013-10-05 00:43:33 NZDT
  LOG:  database system is ready to accept connections
  LOG:  autovacuum launcher started
  PANIC:  space reserved for WAL record does not match what was written:
  CurrPos = 18446744071562067968 EndPos = 2147483648

 Could it be that MAXALIGN/TYPEALIGN doesn't really work for values
 bigger than 32bit?

 #define MAXALIGN(LEN)   TYPEALIGN(MAXIMUM_ALIGNOF, (LEN))
 #define TYPEALIGN(ALIGNVAL,LEN)  \
 (((intptr_t) (LEN) + ((ALIGNVAL) - 1))  ~((intptr_t) ((ALIGNVAL)
 - 1)))


 It looks that way.

 As a quick test I put some printf's around where the MAXALIGN is used:

 /* Align the end position, so that the next record starts aligned */
  printf(CurrPos == %llu (before MAXALIGN)\n, CurrPos);
 CurrPos = MAXALIGN(CurrPos);
 printf(CurrPos == %llu (after MAXALIGN)\n, CurrPos);

 I got the following just before the PANIC.

 CurrPos == 2147483711 (before MAXALIGN)
 CurrPos == 18446744071562068032 (after MAXALIGN)



So I guess this would also break the 32bit linux builds too.

I've attached a proposed patch which seems to fix the problem.

Regards

David




 Regards

 David




maxalign64.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers