Re: [HACKERS] Crash bug in 8.2.3 on Solaris 10/Sparc

2007-03-30 Thread Zdenek Kotala

Zoltan Boszormenyi wrote:



We compiled GCC-4.1.2 on this machine, recompiled PostgreSQL
with the new GCC without --enable-integer-datetimes and it fixed
the problem we experienced. It seems that my suspicion was right:
GCC-3.4.3 on Solaris 10/Sparc is buggy.



I tried original S10 gcc (3.4.3) on two different machine with different 
kernel update and both work fine. In term of our offlist communication 
and Tom's mention, It looks more as problem in linking/loading. Maybe 
some libraries mismatch. I'm not able say more without core.


Zdenek


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Crash bug in 8.2.3 on Solaris 10/Sparc

2007-03-26 Thread Zoltan Boszormenyi

Zoltan Boszormenyi írta:

Zdenek Kotala írta:

Zoltan Boszormenyi wrote:

Hi,

we have found that psql in PostgreSQL 8.2.3
has problems connecting to the server
running on Solaris 10/Sun SPARC.

$ uname -a
SunOS dev-machine 5.10 Generic_118833-36 sun4u sparc SUNW,Sun-Fire-V440

It seems that somehow the system provided
GCC 3.4.3 miscompiles timestamptz_send()
and it segfaults. The default function looks like this:



Can you send me how you compiled Postgres (configure switches, 
LDFLAGS ...) and is possible get core file?


This was the configure line:

./configure --prefix=/export/local/postgresql/postgresql-8.2.3 
--with-includes=/usr/local/include --with-libraries=/usr/local/lib/


I added --enable-debug --enable-depend --enable-cassert
to get sensible gdb report after that.

The problem was that the server had problems
after psql connected with these commands:

$ psql -l -h dev-machine -p 5477 -U user
psql: server closed the connection unexpectedly
   This probably means the server terminated abnormally
   before or while processing the request.
$ psql -h dev-machine -p 5477 -U user template1
psql: server closed the connection unexpectedly
   This probably means the server terminated abnormally
   before or while processing the request.

If the user doesn't have permissions in e.g. pg_hba.conf
then I get the correct permission denied error.
If the user can connect then some statement inside psql
causes segfault in the server.

Compiled with debug info, I got this from gdb on the core file:
$ gdb /.../pgsql/bin/postgres /.../data/core
...
Program terminated with signal 11, Segmentation fault.
#0  0x0021c8a0 in timestamptz_send (fcinfo=0x1) at timestamp.c:461
461 PG_RETURN_BYTEA_P(pq_endtypsend(buf));
(gdb)

I described my experiments, compiling with --enable-integer-datetimes
fixed the issue.


We compiled GCC-4.1.2 on this machine, recompiled PostgreSQL
with the new GCC without --enable-integer-datetimes and it fixed
the problem we experienced. It seems that my suspicion was right:
GCC-3.4.3 on Solaris 10/Sparc is buggy.

--
--
Zoltán Böszörményi
Cybertec Geschwinde  Schönig GmbH
http://www.postgresql.at/


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Crash bug in 8.2.3 on Solaris 10/Sparc

2007-03-26 Thread Zoltan Boszormenyi

Zoltan Boszormenyi írta:

Zoltan Boszormenyi írta:

Zdenek Kotala írta:

Zoltan Boszormenyi wrote:

Hi,

we have found that psql in PostgreSQL 8.2.3
has problems connecting to the server
running on Solaris 10/Sun SPARC.

$ uname -a
SunOS dev-machine 5.10 Generic_118833-36 sun4u sparc 
SUNW,Sun-Fire-V440


It seems that somehow the system provided
GCC 3.4.3 miscompiles timestamptz_send()
and it segfaults. The default function looks like this:



Can you send me how you compiled Postgres (configure switches, 
LDFLAGS ...) and is possible get core file?


This was the configure line:

./configure --prefix=/export/local/postgresql/postgresql-8.2.3 
--with-includes=/usr/local/include --with-libraries=/usr/local/lib/


I added --enable-debug --enable-depend --enable-cassert
to get sensible gdb report after that.

The problem was that the server had problems
after psql connected with these commands:

$ psql -l -h dev-machine -p 5477 -U user
psql: server closed the connection unexpectedly
   This probably means the server terminated abnormally
   before or while processing the request.
$ psql -h dev-machine -p 5477 -U user template1
psql: server closed the connection unexpectedly
   This probably means the server terminated abnormally
   before or while processing the request.

If the user doesn't have permissions in e.g. pg_hba.conf
then I get the correct permission denied error.
If the user can connect then some statement inside psql
causes segfault in the server.

Compiled with debug info, I got this from gdb on the core file:
$ gdb /.../pgsql/bin/postgres /.../data/core
...
Program terminated with signal 11, Segmentation fault.
#0  0x0021c8a0 in timestamptz_send (fcinfo=0x1) at timestamp.c:461
461 PG_RETURN_BYTEA_P(pq_endtypsend(buf));
(gdb)

I described my experiments, compiling with --enable-integer-datetimes
fixed the issue.


We compiled GCC-4.1.2 on this machine, recompiled PostgreSQL
with the new GCC without --enable-integer-datetimes and it fixed
the problem we experienced. It seems that my suspicion was right:
GCC-3.4.3 on Solaris 10/Sparc is buggy.



Oh, and the proof that I use the newly compiled version:

$ psql -h reddb-dev-pgr -p 5477 test
Welcome to psql 8.2.3, the PostgreSQL interactive terminal.

Type:  \copyright for distribution terms
  \h for help with SQL commands
  \? for help with psql commands
  \g or terminate with semicolon to execute query
  \q to quit

test=# select version();
 version  


PostgreSQL 8.2.3 on sparc-sun-solaris2.10, compiled by GCC gcc (GCC) 4.1.2
(1 row)

test=# show integer_datetimes;
integer_datetimes
---
off
(1 row)

--
--
Zoltán Böszörményi
Cybertec Geschwinde  Schönig GmbH
http://www.postgresql.at/


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


[HACKERS] Crash bug in 8.2.3 on Solaris 10/Sparc

2007-03-23 Thread Zoltan Boszormenyi

Hi,

we have found that psql in PostgreSQL 8.2.3
has problems connecting to the server
running on Solaris 10/Sun SPARC.

$ uname -a
SunOS dev-machine 5.10 Generic_118833-36 sun4u sparc SUNW,Sun-Fire-V440

It seems that somehow the system provided
GCC 3.4.3 miscompiles timestamptz_send()
and it segfaults. The default function looks like this:

Datum
timestamptz_send(PG_FUNCTION_ARGS)
{
   TimestampTz timestamp = PG_GETARG_TIMESTAMPTZ(0);
   StringInfoData buf;

   pq_begintypsend(buf);
#ifdef HAVE_INT64_TIMESTAMP
   pq_sendint64(buf, timestamp);
#else
   pq_sendfloat8(buf, timestamp);
#endif
   PG_RETURN_BYTEA_P(pq_endtypsend(buf));
}

GDB indicates crash at the last line.
No matter how I unrolled the function calls,
the indicated crasher line was always the one
before:

   pq_sendfloat8(buf, timestamp);

I must be a stack corruption somehow.
I also unrolled pq_sendfloat8() so the function looks like this:

Datum
timestamptz_send(PG_FUNCTION_ARGS)
{
   TimestampTz timestamp = PG_GETARG_TIMESTAMPTZ(0);
   StringInfoData buf;
   bytea   *byteap;
   union
   {
   float8  f;
   int64   i;
   }   swap;
   uint32  n32;
   pq_begintypsend(buf);
#ifdef HAVE_INT64_TIMESTAMP
   pq_sendint64(buf, timestamp);
   elog(NOTICE, timestamptz_send() HAVE_INT64_TIMESTAMP after
pq_sendint64);
#else
   swap.f = (float8)timestamp;
   elog(NOTICE, timestamptz_send() int64: %lld, swap.i);
   /* High order half first, since we're doing MSB-first */
#ifdef INT64_IS_BUSTED
   /* don't try a right shift of 32 on a 32-bit word */
   n32 = (swap.i  0) ? -1 : 0;
   elog(NOTICE, timestamptz_send() INT64_IS_BUSTED high 32: %d, n32);
#else
   n32 = (uint32) (swap.i  32);
   elog(NOTICE, timestamptz_send() high 32: %d, n32);
#endif
   n32 = htonl(n32);
   elog(NOTICE, timestamptz_send() htonl high 32: %d, n32);
   appendBinaryStringInfo(buf, (char *) n32, 4);

   /* Now the low order half */
   n32 = (uint32) swap.i;
   elog(NOTICE, timestamptz_send() low 32: %d, n32);
   n32 = htonl(n32);
   elog(NOTICE, timestamptz_send() htonl low 32: %d, n32);
   appendBinaryStringInfo(buf, (char *) n32, 4);

   elog(NOTICE, timestamptz_send() pq_sendfloat8);
#endif
   byteap = (bytea *) buf.data;
   elog(NOTICE, timestamptz_send() buf-data = %p, byteap);
   Assert(buf.len = VARHDRSZ);
   VARATT_SIZEP(byteap) = buf.len;
   PG_RETURN_BYTEA_P(byteap);
}

Th crashing line according to GDB is now the elog() call after:

   swap.f = (float8)timestamp;

This is a simple explicit type cast which shouldn't cause problems,
however it is the one that somehow corrupts something on the stack
and causes the segfault upon entering the function at the next
statement.

As a workaround, we recompiled PostgreSQL 8.2.3 with
--enable-integer-datetimes
and the client can connect to the server now, after initdb.

I tried to exercise calling timestamptz_send() but creating a table
with float8 field, INSERTing and SELECTing works, too.
Both textual and binary COPY FROM and COPY TO work, too.
Either these exercises didn't call pq_sendfloat8() or it
doesn't cause problems elsewhere, only in timestamptz_send().


--
--
Zoltán Böszörményi
Cybertec Geschwinde  Schönig GmbH
http://www.postgresql.at/




---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [HACKERS] Crash bug in 8.2.3 on Solaris 10/Sparc

2007-03-23 Thread Zdenek Kotala

Zoltan Boszormenyi wrote:

Hi,

we have found that psql in PostgreSQL 8.2.3
has problems connecting to the server
running on Solaris 10/Sun SPARC.

$ uname -a
SunOS dev-machine 5.10 Generic_118833-36 sun4u sparc SUNW,Sun-Fire-V440

It seems that somehow the system provided
GCC 3.4.3 miscompiles timestamptz_send()
and it segfaults. The default function looks like this:



Can you send me how you compiled Postgres (configure switches, LDFLAGS 
...) and is possible get core file?


Did you try compile with different optimalization flags or did you try 
sun studio compiler?


Zdenek

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Crash bug in 8.2.3 on Solaris 10/Sparc

2007-03-23 Thread Tom Lane
Zoltan Boszormenyi [EMAIL PROTECTED] writes:
 we have found that psql in PostgreSQL 8.2.3
 has problems connecting to the server
 running on Solaris 10/Sun SPARC.
 ...
 It seems that somehow the system provided
 GCC 3.4.3 miscompiles timestamptz_send()
 and it segfaults.

I find it fairly hard to believe that timestamptz_send would be invoked
at all while using psql, much less during initial connection.  psql
doesn't do any binary-output requests.

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Crash bug in 8.2.3 on Solaris 10/Sparc

2007-03-23 Thread Zoltan Boszormenyi

Zdenek Kotala írta:

Zoltan Boszormenyi wrote:

Hi,

we have found that psql in PostgreSQL 8.2.3
has problems connecting to the server
running on Solaris 10/Sun SPARC.

$ uname -a
SunOS dev-machine 5.10 Generic_118833-36 sun4u sparc SUNW,Sun-Fire-V440

It seems that somehow the system provided
GCC 3.4.3 miscompiles timestamptz_send()
and it segfaults. The default function looks like this:



Can you send me how you compiled Postgres (configure switches, LDFLAGS 
...) and is possible get core file?


This was the configure line:

./configure --prefix=/export/local/postgresql/postgresql-8.2.3 
--with-includes=/usr/local/include --with-libraries=/usr/local/lib/


I added --enable-debug --enable-depend --enable-cassert
to get sensible gdb report after that.

The problem was that the server had problems
after psql connected with these commands:

$ psql -l -h dev-machine -p 5477 -U user
psql: server closed the connection unexpectedly
   This probably means the server terminated abnormally
   before or while processing the request.
$ psql -h dev-machine -p 5477 -U user template1
psql: server closed the connection unexpectedly
   This probably means the server terminated abnormally
   before or while processing the request.

If the user doesn't have permissions in e.g. pg_hba.conf
then I get the correct permission denied error.
If the user can connect then some statement inside psql
causes segfault in the server.

Compiled with debug info, I got this from gdb on the core file:
$ gdb /.../pgsql/bin/postgres /.../data/core
...
Program terminated with signal 11, Segmentation fault.
#0  0x0021c8a0 in timestamptz_send (fcinfo=0x1) at timestamp.c:461
461 PG_RETURN_BYTEA_P(pq_endtypsend(buf));
(gdb)

I described my experiments, compiling with --enable-integer-datetimes
fixed the issue.




Did you try compile with different optimalization flags or did you try 
sun studio compiler?


No, and no. Sun Studio isn't installed, only gcc.



Zdenek



--
--
Zoltán Böszörményi
Cybertec Geschwinde  Schönig GmbH
http://www.postgresql.at/


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [HACKERS] Crash bug in 8.2.3 on Solaris 10/Sparc

2007-03-23 Thread Zoltan Boszormenyi

Tom Lane írta:

Zoltan Boszormenyi [EMAIL PROTECTED] writes:
  

we have found that psql in PostgreSQL 8.2.3
has problems connecting to the server
running on Solaris 10/Sun SPARC.
...
It seems that somehow the system provided
GCC 3.4.3 miscompiles timestamptz_send()
and it segfaults.



I find it fairly hard to believe that timestamptz_send would be invoked
at all while using psql, much less during initial connection.  psql
doesn't do any binary-output requests.

regards, tom lane
  


Then please explain this miracle.
Anyway, your comment makes my suspicion about
the correctness of GCC-3.4.3 on Solaris 10/sparc
more founded now. :-)

--
--
Zoltán Böszörményi
Cybertec Geschwinde  Schönig GmbH
http://www.postgresql.at/


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings