Re: [GENERAL] error updating a tuple after promoting a standby

2016-12-22 Thread Brian Sutherland
Perhaps try 9.5.5 which has a fix for a problem with the same symptoms:

https://wiki.postgresql.org/wiki/Free_Space_Map_Problems
https://www.postgresql.org/docs/9.5/static/release-9-5-5.html#AEN126074

On Wed, Dec 21, 2016 at 10:51:47AM +0100, Tom DalPozzo wrote:
> Hi,
> I was doing some tests with backup, replication, standby. After promoting a
> standby server, I found my db in a condition that raises me an error while
> trying to update a particular tuple.
> Below here you can se my UPDATE statment and the error raised.
> The select * from stato where id=409; executed immediately after worked
> well however.
> I checked the file and it's readable.
> Before my standby promotion test I performed millions of this UPDATE
> statments without problem on my db.
> I can not reproduce the issue.
> 
> Perhaps I did something wrong during my test but I don't know what. I
> didn't touch any file in base directory however.
> Anyway I'd like to know if in your opinion it's possible that this error
> was caused by something wrong done by me or if it should never happen as
> the file is perfectly readable.
> 
> Regards
> Pupillo
> 
> 
> 
> 
> psql (9.5.4)
> Type "help" for help.
> 
> ginopino=# UPDATE stato SET
> dati='\x5353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353'
> WHERE id=409;
> ERROR:  could not read block 12281 in file "base/16384/29153": read only 0
> of 8192 bytes
> ginopino=# select * from stato where id=409;  <<< IT WORKS FINE

-- 
Brian Sutherland


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] After replication failover: could not read block X in file Y read only 0 of 8192 bytes

2016-05-31 Thread Brian Sutherland
On Tue, May 31, 2016 at 04:49:26PM +1000, Venkata Balaji N wrote:
> On Mon, May 30, 2016 at 11:37 PM, Brian Sutherland <br...@vanguardistas.net>
> wrote:
> 
> > I'm running a streaming replication setup with PostgreSQL 9.5.2 and have
> > started seeing these errors on a few INSERTs:
> >
> > ERROR:  could not read block 8 in file "base/3884037/3885279": read
> > only 0 of 8192 bytes
> >
> 
> These errors are occurring on master or slave ?

On the master (which was previously a slave)

> > on a few tables. If I look at that specific file, it's only 6 blocks
> > long:
> >
> > # ls -la base/3884037/3885279
> > -rw--- 1 postgres postgres 49152 May 30 12:56 base/3884037/3885279
> >
> > It seems that this is the case on most tables in this state. I havn't
> > seen any error on SELECT and I can SELECT * on the all tables I know
> > have this problem. The database is machine is under reasonable load.
> >
> 
> So, the filenodes generating this error belong to a Table ? or an Index ?

So far I have found 3 tables with this issue, 2 were pg_statistic in
different databases. The one referenced above is definitely a table:
"design_file".

The usage pattern on that table is to DELETE and later INSERT a few
hundred rows at a time on an occasional basis. The table is very small,
680 rows.

> > On some tables an "ANALYZE tablename" causes the error.

I discovered why ANALYZE raised an error, it was because pg_statistic
was affected. "vacuum full verbose pg_statistic;" fixed it. Hoping any
missing statistics get re-generated.

> > We recently had a streaming replication failover after loading a large
> > amount of data with pg_restore. The problems seem to have started after
> > that, but I'm not perfectly sure.
> 
> pg_restore has completed successfully ? 

pg_restore did complete successfully 

> When pg_restore was running, did
> you see anything suspicious in the postgresql logfiles ?

The restore happened on the old master. The logfile was long since
deleted :(

> I have data_checksums switched on so am suspecting a streaming
> > replication bug.  Anyone know of a recent bug which could have caused
> > this?
> >
> 
> I cannot conclude at this point. I encountered these kind of errors with
> Indexes and re-indexing fixed them.

This is actually the second time I am seeing these kinds of errors, in
the past, after verifying that no data was lost I used VACUUM FULL to
recover the ability to INSERT. There was no pitchfork uprising...

> Regards,
> Venkata B N
> 
> Fujitsu Australia

-- 
Brian Sutherland


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] After replication failover: could not read block X in file Y read only 0 of 8192 bytes

2016-05-30 Thread Brian Sutherland
I'm running a streaming replication setup with PostgreSQL 9.5.2 and have
started seeing these errors on a few INSERTs:

ERROR:  could not read block 8 in file "base/3884037/3885279": read only 0 
of 8192 bytes

on a few tables. If I look at that specific file, it's only 6 blocks
long:

# ls -la base/3884037/3885279
-rw--- 1 postgres postgres 49152 May 30 12:56 base/3884037/3885279

It seems that this is the case on most tables in this state. I havn't
seen any error on SELECT and I can SELECT * on the all tables I know
have this problem. The database is machine is under reasonable load.

On some tables an "ANALYZE tablename" causes the error.

We recently had a streaming replication failover after loading a large
amount of data with pg_restore. The problems seem to have started after
that, but I'm not perfectly sure.

I have data_checksums switched on so am suspecting a streaming
replication bug.  Anyone know of a recent bug which could have caused
this?

-- 
Brian Sutherland


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Failure loading materialized view with pg_restore

2015-02-19 Thread Brian Sutherland
On Wed, Feb 18, 2015 at 10:34:33AM -0500, Tom Lane wrote:
 Brian Sutherland br...@vanguardistas.net writes:
  If I run this set of commands against PostgreSQL 9.4.1 I pg_restore
  throws an error with a permission problem. Why it does so is a mystery
  to me, given that the user performing the restore is a superuser:
 
 The same thing would happen without any dump and restore:
 
 regression=# create user nobody;
 CREATE ROLE
 regression=# CREATE TABLE x (y int); 
 CREATE TABLE
 regression=# CREATE MATERIALIZED VIEW myview AS select * from x;
 SELECT 0
 regression=# ALTER TABLE myview OWNER TO nobody;
 ALTER TABLE
 regression=# REFRESH MATERIALIZED VIEW myview;
 ERROR:  permission denied for relation x
 
 User nobody does not have permission to read table x, so the REFRESH
 fails, because the view's query executes as the view's owner.

If you grant select permission for the user nobody on x, pg_restore
still fails even though a REFRESH succeeds:

# superuser creates database and materialized view
createuser -s super

createdb --username super orig
psql --username super -c select 'USING:' || version(); orig
psql --username super -c 'CREATE TABLE x (y int);' orig
psql --username super -c 'CREATE MATERIALIZED VIEW myview AS select * from 
x' orig

# change the owner of the view to myview and grant SELECT to nobody
createuser -S nobody
psql --username super -c 'GRANT SELECT ON x TO nobody' orig
psql --username super -c 'ALTER TABLE myview OWNER TO nobody;' orig

# refresh does work if you are nobody
psql --username nobody -c 'REFRESH MATERIALIZED VIEW myview;' orig

# dump and reload
pg_dump --username super --format c -f dump.dump orig
createdb copied

# pg_restore errors
pg_restore --username super -d copied dump.dump

I guess I provided a too-minimal example...

-- 
Brian Sutherland


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Failure loading materialized view with pg_restore

2015-02-18 Thread Brian Sutherland
Hi,

If I run this set of commands against PostgreSQL 9.4.1 I pg_restore
throws an error with a permission problem. Why it does so is a mystery
to me, given that the user performing the restore is a superuser:

# superuser creates database and materialized view
createuser -s super
createdb --username super orig
psql --username super -c select 'USING:' || version(); orig
psql --username super -c 'CREATE TABLE x (y int);' orig
psql --username super -c 'CREATE MATERIALIZED VIEW myview AS select * from 
x' orig

# change the owner of the view to myview
createuser -S nobody
psql --username super -c 'ALTER TABLE myview OWNER TO nobody;' orig

# dump and reload
pg_dump --username super --format c -f dump.dump orig
createdb copied

# pg_restore errors
pg_restore --username super -d copied dump.dump

The error I get is:

pg_restore: [archiver (db)] Error while PROCESSING TOC:
pg_restore: [archiver (db)] Error from TOC entry 2260; 0 16569 MATERIALIZED 
VIEW DATA myview nobody
pg_restore: [archiver (db)] could not execute query: ERROR:  permission 
denied for relation x
Command was: REFRESH MATERIALIZED VIEW myview;

In pg_hba I am using the trust method for everything (this is a test
cluster).

Is this expected behaviour or a bug?

-- 
Brian Sutherland


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] plpython intermittent ImportErrors

2013-01-17 Thread Brian Sutherland
On Thu, Jan 17, 2013 at 03:18:09PM +0700, Stuart Bishop wrote:
 On Mon, Jan 14, 2013 at 11:30 PM, Brian Sutherland
 br...@vanguardistas.net wrote:
  Hi,
 
  I have a plpython stored procedure which sometimes fails when I run my
  applications automated test suite. The procedure is called hundreds of
  times during the tests but only fails a few times, often with the
  following ImportError:
 
  Traceback (most recent call last):
File 
  /Users/jinty/.buildout/eggs/setuptools-0.6c11-py2.7.egg/site.py, line 73, 
  in module
  __boot()
File 
  /Users/jinty/.buildout/eggs/setuptools-0.6c11-py2.7.egg/site.py, line 2, 
  in __boot
  import sys, imp, os, os.path
File /Users/jinty/src/mp/lib/python2.7/os.py, line 49, in module
  import posixpath as path
File /Users/jinty/src/mp/lib/python2.7/posixpath.py, line 15, in 
  module
  import stat
  ImportError: No module named stat
 
 The first thing that pops out here is that the paths contain
 references to buildout.

Yeah, my test environment is undeniably just too complex :(

 Can you confirm that the exception is actually being received from
 PostgreSQL, or is the error coming from your test harness?

It was definitely from PostgreSQL.

 Is your plpython stored procedure supposed to be using the environment
 constructed by buildout, or the system default environment?

I set PYTHONPATH and PYTHONHOME so that postgres uses my buildout
environment (which is inside a virtualenv :(:(). The hack I use is
undeniably ugly, but does work...

 (we use buildout for our Python code, but our plpythonu stored
 procedures use the stock standard Python environment, as provided by
 the Ubuntu packages).

Sadly, I need to get this running on OSX as that's what our developers
use. On Ubuntu/Debian, I would have definitely used the system python
environment.

 If this is the correct environment, it sounds like you are triggering
 some sort of race condition in the buildout generated .py files. You
 might be able to confirm and/or work around the issue by getting your
 own stanza added to the top of the generated site.py, explicitly
 importing the problematic modules right at the top before any buildout
 magic happens.

I mentioned in a different post, but I did manage to resolve the issue.
OSX has insanely low limits for max open files. Python hit that limit
during the import and hid the real error. Increasing the open file limit
with ulimit before starting postgres resolved the issue. I reported a
bug about it here:

http://bugs.python.org/issue16981

Many thanks for having a look anyway.

-- 
Brian Sutherland


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] plpython intermittent ImportErrors

2013-01-17 Thread Brian Sutherland
On Thu, Jan 17, 2013 at 01:25:54PM +0100, Alban Hertroys wrote:
 On 17 January 2013 12:30, Brian Sutherland br...@vanguardistas.net wrote:
 
   (we use buildout for our Python code, but our plpythonu stored
   procedures use the stock standard Python environment, as provided by
   the Ubuntu packages).
 
  Sadly, I need to get this running on OSX as that's what our developers
  use. On Ubuntu/Debian, I would have definitely used the system python
  environment.
 
 
 Why wouldn't you do the same on OSX? It has Python 2.7 in the base system,
 doesn't it?

But the base system doesn't have many Python libraries, so you need a
packaging system.  Ubuntu/Debian has APT which I am personally very
confortable with.

-- 
Brian Sutherland


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] plpython intermittent ImportErrors

2013-01-16 Thread Brian Sutherland
On Wed, Jan 16, 2013 at 08:10:26AM +1100, Chris Angelico wrote:
 On Tue, Jan 15, 2013 at 4:55 AM, Brian Sutherland
 br...@vanguardistas.net wrote:
  I'm guessing that it's some kind of race condition, but I wouldn't know
  where to start looking.
 
 Look for a recursive import (A imports B, B imports A) 

I've always seen circular imports as deterministic.

But I don't think it's this, because at least some of the tracebacks
occur when importing standard library code during import site at
interpreter startup. It's very unlikely there's a circular import there.

 or multiple
 threads trying to import simultaneously - Python sometimes has issues
 with that. Quite a few of those issues were sorted out in recent 3.x
 versions, but you're using 2.7.

I thought Python, even in 2.7, had an import lock to prevent multiple
threads importing simultaneously:

http://docs.python.org/2/library/imp.html#imp.lock_held

But yes, that could be a lead onto the issue, if the import lock were
broken that could result in the behaviour I see.

Hmm, but checking in the various modules shows that the import lock is
being correctly acquired.

 
 ChrisA
 
 
 -- 
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general

-- 
Brian Sutherland


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] plpython intermittent ImportErrors RESOLVED

2013-01-16 Thread Brian Sutherland
On Mon, Jan 14, 2013 at 09:05:09AM -0800, Adrian Klaver wrote:
 On 01/14/2013 08:30 AM, Brian Sutherland wrote:
 Hi,
 
 I have a plpython stored procedure which sometimes fails when I run my
 applications automated test suite. The procedure is called hundreds of
 times during the tests but only fails a few times, often with the
 following ImportError:
 
  Traceback (most recent call last):
File 
  /Users/jinty/.buildout/eggs/setuptools-0.6c11-py2.7.egg/site.py, line 73, 
  in module
  __boot()
File 
  /Users/jinty/.buildout/eggs/setuptools-0.6c11-py2.7.egg/site.py, line 2, 
  in __boot
  import sys, imp, os, os.path
File /Users/jinty/src/mp/lib/python2.7/os.py, line 49, in module
  import posixpath as path
File /Users/jinty/src/mp/lib/python2.7/posixpath.py, line 15, in 
  module
  import stat
  ImportError: No module named stat
 
 Changing the order in which the tests are run, or running tests
 individually makes the error move/change or disappear. The behaviour is
 the same with PostgreSQL versions 9.2.2 and 9.1.7.
 
 I have tried (but failed) to reproduce this error in a simple .sql
 script. Outside of the tests, it always seems to work.
 
 Having run into a brick wall debugging this, I'm hoping there's someone
 here who can help?
 
 Since order seems to be important what test is run prior to the
 function failing versus the test run when it succeeds?

I finally got out the big hammer. I applied the attached patch to
Python/import.c and started postgres with PYTHONVERBOSE set. I
discovered that the import was failing because the fopen() call on the
module fails with:

# trying /Users/jinty/src/mp/lib/python2.7/linecache.py
Error opening file: Too many open files

So there's at least one bug here, Python should probably raise an
intelligent error message if an import fails because of too many open
files. Reported that here: http://bugs.python.org/issue16981

I had a look at the files open by the process, there were not that many,
so no leaks or anything. Just an utterly insane OSX default maximum open
file descriptors.

Running:

ulimit -n 4096

before starting PostgreSQL resolved my issue completely.

Many thanks to all who helped out!

-- 
Brian Sutherland
--- ./Python/import.c.orig  2013-01-16 13:37:49.0 +0100
+++ ./Python/import.c   2013-01-16 14:03:04.0 +0100
@@ -7,6 +7,7 @@
 #undef Yield /* undefine macro conflicting with winbase.h */
 #include pyarena.h
 #include pythonrun.h
+#include errno.h
 #include errcode.h
 #include marshal.h
 #include code.h
@@ -1478,7 +1479,13 @@
 if (filemode[0] == 'U')
 filemode = r PY_STDIOTEXTMODE;
 fp = fopen(buf, filemode);
+if (fp == NULL  Py_VerboseFlag  1) {
+PySys_WriteStderr(Error opening file: %s\n, strerror( errno 
));
+}
 if (fp != NULL) {
+if (Py_VerboseFlag  1) {
+PySys_WriteStderr(# OPENED\n);
+}
 if (case_ok(buf, len, namelen, name))
 break;
 else {   /* continue search */

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] plpython intermittent ImportErrors

2013-01-14 Thread Brian Sutherland
Hi,

I have a plpython stored procedure which sometimes fails when I run my
applications automated test suite. The procedure is called hundreds of
times during the tests but only fails a few times, often with the
following ImportError:

Traceback (most recent call last):
  File /Users/jinty/.buildout/eggs/setuptools-0.6c11-py2.7.egg/site.py, 
line 73, in module
__boot()
  File /Users/jinty/.buildout/eggs/setuptools-0.6c11-py2.7.egg/site.py, 
line 2, in __boot
import sys, imp, os, os.path   
  File /Users/jinty/src/mp/lib/python2.7/os.py, line 49, in module
import posixpath as path
  File /Users/jinty/src/mp/lib/python2.7/posixpath.py, line 15, in 
module
import stat
ImportError: No module named stat

Changing the order in which the tests are run, or running tests
individually makes the error move/change or disappear. The behaviour is
the same with PostgreSQL versions 9.2.2 and 9.1.7.

I have tried (but failed) to reproduce this error in a simple .sql
script. Outside of the tests, it always seems to work.

Having run into a brick wall debugging this, I'm hoping there's someone
here who can help?

-- 
Brian Sutherland


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] plpython intermittent ImportErrors

2013-01-14 Thread Brian Sutherland
On Mon, Jan 14, 2013 at 09:05:09AM -0800, Adrian Klaver wrote:
 On 01/14/2013 08:30 AM, Brian Sutherland wrote:
 Hi,
 
 I have a plpython stored procedure which sometimes fails when I run my
 applications automated test suite. The procedure is called hundreds of
 times during the tests but only fails a few times, often with the
 following ImportError:
 
  Traceback (most recent call last):
File 
  /Users/jinty/.buildout/eggs/setuptools-0.6c11-py2.7.egg/site.py, line 73, 
  in module
  __boot()
File 
  /Users/jinty/.buildout/eggs/setuptools-0.6c11-py2.7.egg/site.py, line 2, 
  in __boot
  import sys, imp, os, os.path
File /Users/jinty/src/mp/lib/python2.7/os.py, line 49, in module
  import posixpath as path
File /Users/jinty/src/mp/lib/python2.7/posixpath.py, line 15, in 
  module
  import stat
  ImportError: No module named stat
 
 Changing the order in which the tests are run, or running tests
 individually makes the error move/change or disappear. The behaviour is
 the same with PostgreSQL versions 9.2.2 and 9.1.7.
 
 I have tried (but failed) to reproduce this error in a simple .sql
 script. Outside of the tests, it always seems to work.
 
 Having run into a brick wall debugging this, I'm hoping there's someone
 here who can help?
 
 Since order seems to be important what test is run prior to the
 function failing versus the test run when it succeeds?

Experimenting, I can get it down to about 3 tests. At that point it
succeeds about 80% of the time and the errors start being more random
(i.e. different modules are unimportable).  It also starts erroring
inside the stored procedure itself rather than at import site time.
The database backend stops closing the connection immediately.

The 2 preceding tests, in this case, do not call the stored procedure
(or any plpython code) at all.

I'm guessing that it's some kind of race condition, but I wouldn't know
where to start looking.

-- 
Brian Sutherland


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general