Roxanne, you seem to have isolated the problem to a particular geometry column, 
which speaks to this being a PostGIS problem. Since the analyze code was 
re-written in 2.1, and your issue is coming up in a 2.0>2.1 upgrade, that 
further points to the issue potentially being a PostGIS problem. Unless the 
same data works in a PgSQL 9.2/PostGIS 2.1 combo, it seems clear that PgSQL 9.3 
is not the culprit here. (Though I would love to be reassured that 9.2/2.1 
combo also does not work, since that eliminates a bad interaction between 
9.3/2.1 as the issue.)

At some point in order to debug I’ll probably need a copy of the data, or 
access to a system that has the data and a dev environment. Please do file a 
ticket at http://trac.osgeo.org/postgis on this issue.

P

-- 
Paul Ramsey
http://cleverelephant.ca
http://postgis.net

On May 7, 2014 at 11:15:10 PM, Roxanne Reid-Bennett (r...@tara-lu.com) wrote:

Hello,  

We are working out the upgrade of our servers from Postgres 9.1 and  
Postgis 2.0 to Postgres 9.3 and Postgis 2.1  
After building the base stack, The System Admin restored the database  
from a backup. [I'll ask for more details if you need them]  

I have 3 tables with geometry columns in them that when they are  
autovacuumed, vacuumed, or analyzed run the system out of memory. I  
have isolated that the problem for one of the tables is related to a  
geometry column. I have tables in the system that are much larger on  
disk with geometry columns in them that vacuum analyze just fine, so it  
isn't just that they have geometry columns. Two of the tables are  
related to each other, the other is a load of Government supplied data  
and completely separate in detail and concept for data.  

Using the smallest table... we looked at maintenance_work_mem and tried  
several runs with varying values [16MB, 64MB, 256MB, and 500MB]. Larger  
maintenance_work_mem allows the process to run longer before it starts  
gobbling up swap, but the process still spends most of it's time in  
"uninterruptible sleep (usually IO)" state and just eats up the swap  
until all of the memory is gone.  

Smallest table definition, config and log file entries, etc follow  
below. If I have failed to provide necessary or desired information,  
just ask.  

We have noted that the memory management was changed going into 9.3 -  
but we haven't been able to find anything that would indicate any known  
issues ... This problem caused us to take a hard look at the stack  
again, and we will be building a new stack anyway because we need a  
newer GEOS - but we are seriously considering dropping Postgres back to  
9.2.  

I am out of ideas on what else to try after maintenance_work_mem ...  
Does anybody have any suggestions/questions/observations for me?  

Thank you.  

Roxanne  
------------------  

VirutualBox: 4.1.24 Intel Xeon 2.13 GHz (8) 48 Gb RAM  
Virtual Box instance: 64 Bit 4 Processors Base Memory: 12Gb  

running Ubuntu 12.04.1 LTS  
Linux 3.2.0-23-generic #36-Ubuntu SMP Tue Apr 10 20:39:51 UTC 2012  
x86_64 x86_64 x86_64 GNU/Linux  

Postgres: PostgreSQL 9.3.4 on x86_64-unknown-linux-gnu, compiled by gcc  
(Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, 64-bit  
PostGis: POSTGIS="2.1.2 r12389" GEOS="3.3.3-CAPI-1.7.4" PROJ="Rel.  
4.8.0, 6 March 2012" GDAL="GDAL 1.9.2, released 2012/10/08"  
LIBXML="2.7.8" LIBJSON="UNKNOWN" TOPOLOGY RASTER  

Postgres.conf entries (of probable interest - if I didn't list it, it's  
probably defaulted):  

max_connections = 100  
shared_buffers = 4089196kB  
work_mem = 128MB  
maintenance_work_mem = 64MB  
checkpoint_segments = 64  
checkpoint_timeout = 30min  
checkpoint_completion_target = 0.75  
effective_cache_size = 4089196kB  
default_statistics_target = 200  
autovacuum_max_workers = 1 [this is normally set to 3]  

Analyzing the original table "activity" failed. Using a copy of the  
original table with no indexes, no foreign keys, no constraints also  
failed. However, dropping one of the two geometry columns (region) out  
of the copy allowed it to succeed. Taking a copy of just "region" which  
contains (Multi)Polygons and the primary key via "CREATE TABLE ... as  
(Select...)", from the original table "activity" to create  
temp.region... analyze runs out of memory. The following were run  
against temp.region.  

smallest/shortest table definition from \d:  

Table "temp.region"  
Column | Type | Modifiers  
-------------+-------------------------+-----------  
activity_id | integer |  
region | geometry(Geometry,4326) |  

<shell>  
HQ4_Staging=# analyze verbose temp.region;  
INFO: 00000: analyzing "temp.region"  
LOCATION: do_analyze_rel, analyze.c:335  
INFO: 00000: "region": scanned 1022 of 1022 pages, containing 52990  
live rows and 0 dead rows; 52990 rows in sample, 52990 estimated total rows  
LOCATION: acquire_sample_rows, analyze.c:1299  
The connection to the server was lost. Attempting reset: Failed.  
</shell>  

Duration of the above was approximately 1.25 hrs.  

The Log files show:  
<postgres>  
2014-05-07 16:56:56 EDT|2054| LOG: server process (PID 6663) was  
terminated by signal 9: Killed  
2014-05-07 16:56:56 EDT|2054| DETAIL: Failed process was running:  
analyze verbose temp.region;  
2014-05-07 16:56:56 EDT|2054| LOG: terminating any other active server  
processes  
<syslog>  
May 7 16:56:55 hq4-staging-database kernel: [458605.351369] postgres  
invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0  

... [I have the full stack if anyone wants it]  

May 7 16:56:55 hq4-staging-database kernel: [458605.408021] Out of  
memory: Kill process 6663 (postgres) score 920 or sacrifice child  
May 7 16:56:55 hq4-staging-database kernel: [458605.412287] Killed  
process 6663 (postgres) total-vm:20269840kB, anon-rss:8625876kB,  
file-rss:3082472kB  


This is a test box, which we know is much slower/smaller than our  
production box, but normally sufficient. As a test box, we had no other  
major activity going on. We do have a cron job that looks for reports  
to be run, but no reports were run during these tests and the cron job  
doesn't interact with this table.  

I did find bug #2267: "Server crash from analyze table" from last April  
- it's test case does NOT crash this stack [which is good because it  
shouldn't]  


--  
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)  
To make changes to your subscription:  
http://www.postgresql.org/mailpref/pgsql-general  

Reply via email to