subject:"Re\: \[HACKERS\] Odd out of memory problem."

Re: [HACKERS] Odd out of memory problem.

2012-03-31 Thread Hitoshi Harada

On Thu, Mar 29, 2012 at 7:38 PM, Peter Eisentraut pete...@gmx.net wrote:
 On tis, 2012-03-27 at 00:53 +0100, Greg Stark wrote:
 Hm. So my original plan was dependent on adding the state-merge
 function we've talked about in the past. Not all aggregate functions
 necessarily can support such a function but I think all or nearly all
 the builtin aggregates can. Certainly min,max, count, sum, avg,
 stddev, array_agg can which are most of what people do. That would be
 a function which can take two state variables and produce a new state
 variable.

 This information could also be useful to have in PL/Proxy (or similar
 FDWs) to be able to integrate aggregate computation into the language.
 Currently, you always have to do the state merging yourself.


I don't know exactly how PL/Proxy or pgpool accomplish the multi-phase
aggregate, but in theory the proposal above is state-merge function,
so it doesn't apply to general aggregate results that passed through
the final function.  Of course some functions that don't have final
functions are ok to call state-merge function on the results.


Thanks,
-- 
Hitoshi Harada

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Odd out of memory problem.

2012-03-31 Thread Peter Eisentraut

On fre, 2012-03-30 at 22:59 -0700, Hitoshi Harada wrote:
 I don't know exactly how PL/Proxy or pgpool accomplish the multi-phase
 aggregate,

They don't.

 but in theory the proposal above is state-merge function,
 so it doesn't apply to general aggregate results that passed through
 the final function.  Of course some functions that don't have final
 functions are ok to call state-merge function on the results.

You're right, it's not quite the same thing.  But perhaps it could be
kept in mind if someone wants to develop things in this area.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Odd out of memory problem.

2012-03-29 Thread Peter Eisentraut

On tis, 2012-03-27 at 00:53 +0100, Greg Stark wrote:
 Hm. So my original plan was dependent on adding the state-merge
 function we've talked about in the past. Not all aggregate functions
 necessarily can support such a function but I think all or nearly all
 the builtin aggregates can. Certainly min,max, count, sum, avg,
 stddev, array_agg can which are most of what people do. That would be
 a function which can take two state variables and produce a new state
 variable.

This information could also be useful to have in PL/Proxy (or similar
FDWs) to be able to integrate aggregate computation into the language.
Currently, you always have to do the state merging yourself.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Odd out of memory problem.

2012-03-27 Thread Hitoshi Harada

On Mon, Mar 26, 2012 at 5:11 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Greg Stark st...@mit.edu writes:
 On Mon, Mar 26, 2012 at 6:15 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Could you give us a brain dump on the sketch?  I've never seen how to
 do it without unreasonable overhead.

 Hm. So my original plan was dependent on adding the state-merge
 function we've talked about in the past. Not all aggregate functions
 necessarily can support such a function but I think all or nearly all
 the builtin aggregates can. Certainly min,max, count, sum, avg,
 stddev, array_agg can which are most of what people do. That would be
 a function which can take two state variables and produce a new state
 variable.

 I'd rather not invent new requirements for aggregate implementations
 if we can avoid it.

 However now that I've started thinking about it further I think you
 could solve it with less complexity by cheating in various ways. For
 example if you limit the hash size to 1/2 of work_mem then you when
 you reach that limit you could just stuff any tuple that doesn't match
 a hash entry into a tuplesort with 1/2 of work_mem and do the regular
 level break logic on the output of that.

 Or just start dumping such tuples into a tuplestore, while continuing to
 process tuples that match the hashagg entries that are already in
 existence.  Once the input is exhausted, read out the hashagg entries we
 have, flush the hashagg table, start reading from the tuplestore.
 Repeat as needed.

 I like this idea because the only thing you give up is predictability of
 the order of output of aggregated entries, which is something that a
 hashagg isn't guaranteeing anyway.  In particular, we would still have a
 guarantee that any one aggregate evaluation processes the matching
 tuples in arrival order, which is critical for some aggregates.

 The main problem I can see is that if we start to flush after work_mem
 is X% full, we're essentially hoping that the state values for the
 existing aggregates won't grow by more than 1-X%, which is safe for many
 common aggregates but fails for some like array_agg().  Ultimately, for
 ones like that, it'd probably be best to never consider hashing at all.
 I guess we could invent an unsafe for hash aggregation flag for
 aggregates that have unbounded state-size requirements.


According to what I've learned in the last couple of months, array_agg
is not ready for fallback ways like dumping to tuplestore.  Even
merge-state is not able for them.  The problem is that the executor
doesn't know how to serialize/deserialize the internal type trans
value.  So in one implementation, the existence of merge function is a
flag to switch back to sort grouping not hash aggregate and array_agg
is one of such aggregate functions.  That said, if you invent a new
flag to note the aggregate is not dump-ready, it'd be worth inventing
state merge function to aggregate infrastructure anyway.

So I can imagine a way without state-merge function nor dumping to
tuplestore would be to sort hash table content the rest of inputs so
that we can switch to sort grouping.  Since we have hash table, we can
definitely sort them in memory, and we can put to disk everything that
comes later than the fallback and read it after the scan finishes. Now
we have sorted state values and sorted input, we can continue the rest
of work.

Anyway the memory usage problem is not only array_agg and hash
aggregate.  Even if you can say the hash table exceeds X% of the
work_mem, how can you tell other operators such like Sort are not
using the rest of memory?  One approach I could see to avoid this is
assigning arbitrary amount of memory to each operator from work_mem
and calculate it locally.  But this approach is going to skew
occasionally and not perfect, either.


Thanks,
-- 
Hitoshi Harada

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Odd out of memory problem.

2012-03-27 Thread Robert Haas

On Tue, Mar 27, 2012 at 3:22 AM, Hitoshi Harada umi.tan...@gmail.com wrote:
 According to what I've learned in the last couple of months, array_agg
 is not ready for fallback ways like dumping to tuplestore.  Even
 merge-state is not able for them.  The problem is that the executor
 doesn't know how to serialize/deserialize the internal type trans
 value.  So in one implementation, the existence of merge function is a
 flag to switch back to sort grouping not hash aggregate and array_agg
 is one of such aggregate functions.  That said, if you invent a new
 flag to note the aggregate is not dump-ready, it'd be worth inventing
 state merge function to aggregate infrastructure anyway.

 So I can imagine a way without state-merge function nor dumping to
 tuplestore would be to sort hash table content the rest of inputs so
 that we can switch to sort grouping.  Since we have hash table, we can
 definitely sort them in memory, and we can put to disk everything that
 comes later than the fallback and read it after the scan finishes. Now
 we have sorted state values and sorted input, we can continue the rest
 of work.

It's a little bit tricky to make this work - you have to get all of
the values out of the hash-table you've built and stick them into a
Tuplesort object - but I think it can be made to work, and it seems
more elegant than anything else proposed so far.

I also agree with you and with Greg Stark that it would be good to
invent a state-merge function.  Although it wouldn't apply to every
case, it would make some very common cases a lot more efficient, both
in run time and in memory.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Odd out of memory problem.

2012-03-27 Thread Andrew Dunstan




On 03/26/2012 01:54 PM, Andrew Dunstan wrote:



On 03/26/2012 01:34 PM, Tom Lane wrote:

Andrew Dunstanand...@dunslane.net  writes:

On 03/26/2012 01:06 PM, Heikki Linnakangas wrote:

Is it possible this job is inserting and then updating (or deleteing)
the row it just inserted and doing a large number of such
insert/update operations all within the same transaction? Or perhaps
it's updating the same row over and over again?

It's all in a single transaction. In fact the solution I'm currently
testing and seems to be working involves breaking it up into batches of
a few thousand LOs restored per batch.
Hm.  The test case is just a straight pg_restore of lots and lots of 
LOs?

What pg_dump version was the dump made with?





8.4.8, same as the target. We get the same issue whether we restore 
direct to the database from pg_restore or via a text dump.




Following this up, the workaround of making small batches of LOs did 
solve that memory issue. Here's what I did:


   pg_restore --list full_export.dmp | grep BLOB  bloblist
   pg_restore --use-list=bloblist full_export.dmp | \
   perl -n e  ' $n++  if /lo_open/; ' \
-e ' print  end; begin;\n if (/lo_open/  ($n % 1000 ==
   0)); ' \
-e ' print ;' |  \
   psql -t -q -v ON_ERROR_STOP dbname=adtest  /dev/null


That's a fairly ugly hack to have to use.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Odd out of memory problem.

2012-03-26 Thread Hans-Jürgen Schönig

hello,

does the problem show up on 2% of all problems after 2 weeks or so?
we had a similar problem on UNIX as well. it even materialized on 100 identical 
boxes (on 2% of them). it pops up randomly and never stops …
i checked some code paths. some of those messages are direct output via stderr 
(not even elog).
unfortunately i did not manage to find a box where i could GDB to attack the 
problem .
it was 8.4.8 as well.

do you see a certain workload which would make the problem reproducable?


regards,

hans



On Mar 26, 2012, at 5:03 PM, Andrew Dunstan wrote:

 
 I'm not sure if this is a bug, but I have wrestling with this problem for a 
 client.
 
 Platform is Windows Servers 2003 64 bit, PostgreSQL 8.4.8., 4Gb RAM, running 
 on an Amazon VM.
 
 Shared buffers: 512Mb, work_mem: 25Mb. There are only a handful of 
 connections to the database, and no other activity.
 
 We are seeing the error shown below. The table in question has two columns 
 (Oid, int) and roughly 43m rows. The only other thing remarkable about the 
 settings is that effective_cache_size is set to 5Gb, which is clearly too 
 high, but surely that shouldn't cause a memory error.
 
 I'm really perplexed as to why this fairly simple query should cause an out 
 of memory error:
 
   select loid, max(pageno) from ldata group by loid order by 2 desc
   limit 10;
 
 I can't see what I might be missing.
 
 
 cheers
 
 andrew
 
   TopMemoryContext: 49816 total in 6 blocks; 5384 free (7 chunks);
   44432 used
  TopTransactionContext: 8192 total in 1 blocks; 7696 free (0
   chunks); 496 used
  Record information cache: 8192 total in 1 blocks; 1800 free (0
   chunks); 6392 used
  Type information cache: 8192 total in 1 blocks; 1800 free (0
   chunks); 6392 used
  Operator class cache: 8192 total in 1 blocks; 3848 free (0
   chunks); 4344 used
  Operator lookup cache: 24576 total in 2 blocks; 14072 free (6
   chunks); 10504 used
  MessageContext: 40960 total in 3 blocks; 29920 free (6 chunks);
   11040 used
  smgr relation table: 8192 total in 1 blocks; 2816 free (0
   chunks); 5376 used
  TransactionAbortContext: 32768 total in 1 blocks; 32752 free (0
   chunks); 16 used
  Portal hash: 8192 total in 1 blocks; 3912 free (0 chunks); 4280 used
  PortalMemory: 8192 total in 1 blocks; 8040 free (0 chunks); 152 used
PortalHeapMemory: 1024 total in 1 blocks; 920 free (0 chunks);
   104 used
  ExecutorState: 8192 total in 1 blocks; 2144 free (1 chunks);
   6048 used
TupleSort: 40984 total in 3 blocks; 24208 free (10 chunks);
   16776 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
AggContext: 864018432 total in 127 blocks; 3400 free (110
   chunks); 864015032 used
  TupleHashTable: 619175960 total in 95 blocks; 821528 free
   (331 chunks); 618354432 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
  Relcache by OID: 8192 total in 1 blocks; 3376 free (0 chunks);
   4816 used
  CacheMemoryContext: 667696 total in 20 blocks; 169960 free (2
   chunks); 497736 used
pg_shdepend_reference_index: 1024 total in 1 blocks; 240 free
   (0 chunks); 784 used
pg_depend_depender_index: 1024 total in 1 blocks; 152 free (0
   chunks); 872 used
pg_depend_reference_index: 1024 total in 1 blocks; 152 free (0
   chunks); 872 used
pg_largeobject_loid_pn_index: 1024 total in 1 blocks; 280 free
   (0 chunks); 744 used
pg_database_datname_index: 1024 total in 1 blocks; 344 free (0
   chunks); 680 used
pg_index_indrelid_index: 1024 total in 1 blocks; 304 free (0
   chunks); 720 used
pg_opclass_am_name_nsp_index: 1024 total in 1 blocks; 152 free
   (0 chunks); 872 used
pg_foreign_data_wrapper_name_index: 1024 total in 1 blocks; 344
   free (0 chunks); 680 used
pg_enum_oid_index: 1024 total in 1 blocks; 344 free (0 chunks);
   680 used
pg_class_relname_nsp_index: 1024 total in 1 blocks; 240 free (0
   chunks); 784 used
pg_foreign_server_oid_index: 1024 total in 1 blocks; 344 free
   (0 chunks); 680 used
pg_statistic_relid_att_index: 1024 total in 1 blocks; 240 free
   (0 chunks); 784 used
pg_cast_source_target_index: 1024 total in 1 blocks; 240 free
   (0 chunks); 784 used
pg_language_name_index: 1024 total in 1 blocks; 344 free (0
   chunks); 680 used
pg_authid_oid_index: 1024 total in 1 blocks; 304 free (0
   chunks); 720 used
pg_amop_fam_strat_index: 1024 total in 1 blocks; 88 free (0
   chunks); 936 used
pg_index_indexrelid_index: 1024 total in 1 blocks; 304 free (0
   chunks); 720 used
pg_ts_template_tmplname_index: 1024 total in 1 blocks; 280 free
   (0 chunks); 744 used
pg_ts_config_map_index: 1024 total

Re: [HACKERS] Odd out of memory problem.

2012-03-26 Thread Tom Lane

Andrew Dunstan and...@dunslane.net writes:
 I'm really perplexed as to why this fairly simple query should cause an 
 out of memory error:
 select loid, max(pageno) from ldata group by loid order by 2 desc
 limit 10;

Looks like the group by/aggregate step is eating lots of memory:

  AggContext: 864018432 total in 127 blocks; 3400 free (110
 chunks); 864015032 used
TupleHashTable: 619175960 total in 95 blocks; 821528 free
 (331 chunks); 618354432 used

A guess is that there are a huge number of distinct values of loid but
the planner fails to realize that and tries to use a hash aggregation.
Could we see EXPLAIN output for this query?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Odd out of memory problem.

2012-03-26 Thread Andrew Dunstan




On 03/26/2012 11:18 AM, Tom Lane wrote:

Andrew Dunstanand...@dunslane.net  writes:

I'm really perplexed as to why this fairly simple query should cause an
out of memory error:
 select loid, max(pageno) from ldata group by loid order by 2 desc
 limit 10;

Looks like the group by/aggregate step is eating lots of memory:


  AggContext: 864018432 total in 127 blocks; 3400 free (110
 chunks); 864015032 used
TupleHashTable: 619175960 total in 95 blocks; 821528 free
 (331 chunks); 618354432 used

A guess is that there are a huge number of distinct values of loid but
the planner fails to realize that and tries to use a hash aggregation.
Could we see EXPLAIN output for this query?


Currently it shows:

Limit  (cost=19443025.87..19443025.89 rows=10 width=8
  -  Sort  (cost=19443025.87..19446451.29 rows=1370168 width=8)
Sort Key: (max(pageno))
-  GroupAggregate  (cost=18537785.99..19413417.03 rows=1370168 
width=8)
  -  Sort  (cost=18537785.99..18823953.97 rows=114467192 
width=8)

Sort Key: loid
-  Seq Scan on ldata  (cost=0.00..1651163.92 
rows=114467192 width=8)




The table might have been analysed since I ran the query, though.

To answer Hans' question, we have seen the problem in other contexts. We 
first noticed this problem in a failure to restore large objects when 
running pg_restore.  The database has 43,998,486 LOs on 114,467,137 
pages. The largest of these is 2160 pages. We're currently running a 
test to see if we can successfully restore LOs by doing them in smaller 
batches rather than in a single transaction. However, this one seemed 
even odder than the LO problem.



cheers

andrew




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Odd out of memory problem.

2012-03-26 Thread Tom Lane

Andrew Dunstan and...@dunslane.net writes:
 On 03/26/2012 11:18 AM, Tom Lane wrote:
 Could we see EXPLAIN output for this query?

 Currently it shows:

 Limit  (cost=19443025.87..19443025.89 rows=10 width=8
-  Sort  (cost=19443025.87..19446451.29 rows=1370168 width=8)
  Sort Key: (max(pageno))
  -  GroupAggregate  (cost=18537785.99..19413417.03 rows=1370168 
 width=8)
-  Sort  (cost=18537785.99..18823953.97 rows=114467192 
 width=8)
  Sort Key: loid
  -  Seq Scan on ldata  (cost=0.00..1651163.92 
 rows=114467192 width=8)

 The table might have been analysed since I ran the query, though.

That plan should not create a tuple hash table, so I think it's almost
certain that the plan changed.  It might be interesting to remove the
pg_statistic rows for the table and then see what plan you get.

 To answer Hans' question, we have seen the problem in other contexts. We 
 first noticed this problem in a failure to restore large objects when 
 running pg_restore.

[ scratches head... ]  I don't understand how or why pg_restore would be
executing such a query.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Odd out of memory problem.

2012-03-26 Thread Greg Stark

On Mon, Mar 26, 2012 at 4:03 PM, Andrew Dunstan and...@dunslane.net wrote:
              TupleHashTable: 619175960 total in 95 blocks; 821528 free
   (331 chunks); 618354432 used

I think the plan you showed isn't the plan that's running out of
memory. I think it's running out of memory because it's using a Hash
Aggregate and underestimating the number of distinct elements. I would
set enable_hash_aggregate=false for this query (not for the whole
server, hash aggregates are quite useful in general).

This is a long-standing problem with hash aggregates. The problem is
that we don't have a clever way to fall back to a non-hash-aggregate
if the original estimated memory usage turns out to be way off. Just
spilling to disk the way hash joins do would be horrendously
expensive, we think. If we had an approach to doing so it would be
very useful because simply running out of memory is kind of rude.


-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Odd out of memory problem.

2012-03-26 Thread Andrew Dunstan




On 03/26/2012 12:11 PM, Tom Lane wrote:

Andrew Dunstanand...@dunslane.net  writes:

On 03/26/2012 11:18 AM, Tom Lane wrote:

Could we see EXPLAIN output for this query?

Currently it shows:
Limit  (cost=19443025.87..19443025.89 rows=10 width=8
-   Sort  (cost=19443025.87..19446451.29 rows=1370168 width=8)
  Sort Key: (max(pageno))
  -   GroupAggregate  (cost=18537785.99..19413417.03 rows=1370168
width=8)
-   Sort  (cost=18537785.99..18823953.97 rows=114467192
width=8)
  Sort Key: loid
  -   Seq Scan on ldata  (cost=0.00..1651163.92
rows=114467192 width=8)
The table might have been analysed since I ran the query, though.

That plan should not create a tuple hash table, so I think it's almost
certain that the plan changed.  It might be interesting to remove the
pg_statistic rows for the table and then see what plan you get.



Yeah, that gets us:


Limit  (cost=2223492.78..2223492.81 rows=10 width=8)
  -  Sort  (cost=2223492.78..2223493.28 rows=200 width=8)
Sort Key: (max(pageno))
-  HashAggregate  (cost=2223485.96..2223488.46 rows=200 width=8)
  -  Seq Scan on ldata  (cost=0.00..1651154.64 
rows=114466264 width=8)






To answer Hans' question, we have seen the problem in other contexts. We
first noticed this problem in a failure to restore large objects when
running pg_restore.

[ scratches head... ]  I don't understand how or why pg_restore would be
executing such a query.





It's not. I was explaining that we have seen memory failures in *other* 
contexts, not just this query. The restore fails after many hours on a 
call to lo_write().


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Odd out of memory problem.

2012-03-26 Thread Andrew Dunstan




On 03/26/2012 12:20 PM, Greg Stark wrote:

On Mon, Mar 26, 2012 at 4:03 PM, Andrew Dunstanand...@dunslane.net  wrote:

  TupleHashTable: 619175960 total in 95 blocks; 821528 free
   (331 chunks); 618354432 used

I think the plan you showed isn't the plan that's running out of
memory. I think it's running out of memory because it's using a Hash
Aggregate and underestimating the number of distinct elements. I would
set enable_hash_aggregate=false for this query (not for the whole
server, hash aggregates are quite useful in general).

This is a long-standing problem with hash aggregates. The problem is
that we don't have a clever way to fall back to a non-hash-aggregate
if the original estimated memory usage turns out to be way off. Just
spilling to disk the way hash joins do would be horrendously
expensive, we think. If we had an approach to doing so it would be
very useful because simply running out of memory is kind of rude.



Ugh, Ok, thanks for the explanation.

So it looks like I'm running into two separate memory problems. Here's a 
trace from the other one


cheers

andrew

   TopMemoryContext: 49816 total in 6 blocks; 6840 free (9 chunks);
   42976 used
  Operator class cache: 8192 total in 1 blocks; 3848 free (0
   chunks); 4344 used
  Filesystem: 8192 total in 1 blocks; 7872 free (0 chunks); 320 used
  TopTransactionContext: 213901336 total in 10 blocks; 1796344 free
   (25 chunks); 212104992 used
Combo CIDs: 755490840 total in 100 blocks; 5161072 free (381
   chunks); 750329768 used
  MessageContext: 73728 total in 4 blocks; 44920 free (15 chunks);
   28808 used
  smgr relation table: 8192 total in 1 blocks; 2816 free (0
   chunks); 5376 used
  TransactionAbortContext: 32768 total in 1 blocks; 32752 free (0
   chunks); 16 used
  Portal hash: 8192 total in 1 blocks; 3912 free (0 chunks); 4280 used
  PortalMemory: 8192 total in 1 blocks; 8040 free (0 chunks); 152 used
PortalHeapMemory: 1024 total in 1 blocks; 920 free (0 chunks);
   104 used
  ExecutorState: 8192 total in 1 blocks; 5880 free (1 chunks);
   2312 used
ExprContext: 24576 total in 2 blocks; 13664 free (8
   chunks); 10912 used
  Relcache by OID: 8192 total in 1 blocks; 3376 free (0 chunks);
   4816 used
  CacheMemoryContext: 405552 total in 19 blocks; 14296 free (5
   chunks); 391256 used
pg_largeobject_loid_pn_index: 1024 total in 1 blocks; 240 free
   (0 chunks); 784 used
pg_index_indrelid_index: 1024 total in 1 blocks; 304 free (0
   chunks); 720 used
pg_opclass_am_name_nsp_index: 1024 total in 1 blocks; 192 free
   (0 chunks); 832 used
pg_foreign_data_wrapper_name_index: 1024 total in 1 blocks; 344
   free (0 chunks); 680 used
pg_enum_oid_index: 1024 total in 1 blocks; 344 free (0 chunks);
   680 used
pg_class_relname_nsp_index: 1024 total in 1 blocks; 280 free (0
   chunks); 744 used
pg_foreign_server_oid_index: 1024 total in 1 blocks; 344 free
   (0 chunks); 680 used
pg_statistic_relid_att_index: 1024 total in 1 blocks; 280 free
   (0 chunks); 744 used
pg_cast_source_target_index: 1024 total in 1 blocks; 240 free
   (0 chunks); 784 used
pg_language_name_index: 1024 total in 1 blocks; 344 free (0
   chunks); 680 used
pg_authid_oid_index: 1024 total in 1 blocks; 304 free (0
   chunks); 720 used
pg_amop_fam_strat_index: 1024 total in 1 blocks; 88 free (0
   chunks); 936 used
pg_index_indexrelid_index: 1024 total in 1 blocks; 304 free (0
   chunks); 720 used
pg_ts_template_tmplname_index: 1024 total in 1 blocks; 280 free
   (0 chunks); 744 used
pg_ts_config_map_index: 1024 total in 1 blocks; 192 free (0
   chunks); 832 used
pg_opclass_oid_index: 1024 total in 1 blocks; 304 free (0
   chunks); 720 used
pg_foreign_data_wrapper_oid_index: 1024 total in 1 blocks; 344
   free (0 chunks); 680 used
pg_auth_members_member_role_index: 1024 total in 1 blocks; 280
   free (0 chunks); 744 used
pg_ts_dict_oid_index: 1024 total in 1 blocks; 344 free (0
   chunks); 680 used
pg_conversion_default_index: 1024 total in 1 blocks; 128 free
   (0 chunks); 896 used
pg_operator_oprname_l_r_n_index: 1024 total in 1 blocks; 128
   free (0 chunks); 896 used
pg_trigger_tgrelid_tgname_index: 1024 total in 1 blocks; 240
   free (0 chunks); 784 used
pg_enum_typid_label_index: 1024 total in 1 blocks; 280 free (0
   chunks); 744 used
pg_ts_config_oid_index: 1024 total in 1 blocks; 344 free (0
   chunks); 680 used
pg_user_mapping_oid_index: 1024 total in 1 blocks; 344 free (0
   chunks); 680 used
pg_opfamily_am_name_nsp_index: 1024 total in 1 blocks; 192 free
   (0 chunks); 832 used
pg_type_oid_index: 1024 total in 1 blocks; 304 free (0 chunks);
   720 used
pg_aggregate_fnoid_index: 1024 total in 1 blocks; 344 free (0
   chunks); 680 used

Re: [HACKERS] Odd out of memory problem.

2012-03-26 Thread Tom Lane

Andrew Dunstan and...@dunslane.net writes:
 On 03/26/2012 12:11 PM, Tom Lane wrote:
 That plan should not create a tuple hash table, so I think it's almost
 certain that the plan changed.  It might be interesting to remove the
 pg_statistic rows for the table and then see what plan you get.

 Yeah, that gets us:

 Limit  (cost=2223492.78..2223492.81 rows=10 width=8)
-  Sort  (cost=2223492.78..2223493.28 rows=200 width=8)
  Sort Key: (max(pageno))
  -  HashAggregate  (cost=2223485.96..2223488.46 rows=200 width=8)
-  Seq Scan on ldata  (cost=0.00..1651154.64 
 rows=114466264 width=8)

Hm.  This illustrates that it's not too prudent to rely on a default
numdistinct estimate to decide that a hash aggregation is safe :-(.
We had probably better tweak the cost estimation rules to not trust
that.  Maybe, if we have a default estimate, we should take the worst
case estimate that the column might be unique?  That could still burn
us if the rowcount estimate was horribly wrong, but those are not nearly
as shaky as numdistinct estimates ...

 [ scratches head... ]  I don't understand how or why pg_restore would be
 executing such a query.

 It's not. I was explaining that we have seen memory failures in *other* 
 contexts, not just this query. The restore fails after many hours on a 
 call to lo_write().

Seems probably unrelated then.  Have you got a memory-usage dump for
that case?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Odd out of memory problem.

2012-03-26 Thread Greg Stark

On Mon, Mar 26, 2012 at 5:41 PM, Andrew Dunstan and...@dunslane.net wrote:
        Combo CIDs: 755490840 total in 100 blocks; 5161072 free (381
   chunks); 750329768 used

I think you'll have to catch Heikki's attention to get a good answer to this.

Is it possible this job is inserting and then updating (or deleteing)
the row it just inserted and doing a large number of such
insert/update operations all within the same transaction? Or perhaps
it's updating the same row over and over again?

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Odd out of memory problem.

2012-03-26 Thread Simon Riggs

On Mon, Mar 26, 2012 at 5:43 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Hm.  This illustrates that it's not too prudent to rely on a default
 numdistinct estimate to decide that a hash aggregation is safe :-(.
 We had probably better tweak the cost estimation rules to not trust
 that.  Maybe, if we have a default estimate, we should take the worst
 case estimate that the column might be unique?  That could still burn
 us if the rowcount estimate was horribly wrong, but those are not nearly
 as shaky as numdistinct estimates ...

The selectivity API should include some way of indicating the accuracy
of the answer, as well as the answer itself.

That way we could respond better in a wide range of circumstances.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Odd out of memory problem.

2012-03-26 Thread Heikki Linnakangas


On 26.03.2012 19:51, Greg Stark wrote:

On Mon, Mar 26, 2012 at 5:41 PM, Andrew Dunstanand...@dunslane.net  wrote:

Combo CIDs: 755490840 total in 100 blocks; 5161072 free (381
   chunks); 750329768 used


I think you'll have to catch Heikki's attention to get a good answer to this.

Is it possible this job is inserting and then updating (or deleteing)
the row it just inserted and doing a large number of such
insert/update operations all within the same transaction? Or perhaps
it's updating the same row over and over again?


.. and all that in different subtransactions.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Odd out of memory problem.

2012-03-26 Thread Greg Stark

On Mon, Mar 26, 2012 at 5:43 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Hm.  This illustrates that it's not too prudent to rely on a default
 numdistinct estimate to decide that a hash aggregation is safe :-(.
 We had probably better tweak the cost estimation rules to not trust
 that.  Maybe, if we have a default estimate, we should take the worst
 case estimate that the column might be unique?  That could still burn
 us if the rowcount estimate was horribly wrong, but those are not nearly
 as shaky as numdistinct estimates ...

Perhaps we should have two work_mem settings -- one for the target to
aim for and one for a hard(er) limit that we should ensure the worst
case falls under?

I have a sketch for how to handle spilling hash aggregates to disk in
my head. I'm not sure if it's worth the amount of complexity it would
require but I'll poke around a bit and see if it works out well.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Odd out of memory problem.

2012-03-26 Thread Heikki Linnakangas


On 26.03.2012 19:59, Heikki Linnakangas wrote:

On 26.03.2012 19:51, Greg Stark wrote:

On Mon, Mar 26, 2012 at 5:41 PM, Andrew Dunstanand...@dunslane.net
wrote:

Combo CIDs: 755490840 total in 100 blocks; 5161072 free (381
chunks); 750329768 used


I think you'll have to catch Heikki's attention to get a good answer
to this.

Is it possible this job is inserting and then updating (or deleteing)
the row it just inserted and doing a large number of such
insert/update operations all within the same transaction? Or perhaps
it's updating the same row over and over again?


.. and all that in different subtransactions.


sorry, scratch that, they don't need to be in different subtransactions.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Odd out of memory problem.

2012-03-26 Thread Tom Lane

Greg Stark st...@mit.edu writes:
 I have a sketch for how to handle spilling hash aggregates to disk in
 my head. I'm not sure if it's worth the amount of complexity it would
 require but I'll poke around a bit and see if it works out well.

It'd be awfully nice if those could spill to disk.  I think that
currently that's the only plan type where a misestimate can lead to
hard failure rather than just slower-than-you'd-like.  Which is not
nice considering that the estimates are necessarily just estimates.

Could you give us a brain dump on the sketch?  I've never seen how to
do it without unreasonable overhead.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Odd out of memory problem.

2012-03-26 Thread Andrew Dunstan




On 03/26/2012 01:06 PM, Heikki Linnakangas wrote:

On 26.03.2012 19:59, Heikki Linnakangas wrote:

On 26.03.2012 19:51, Greg Stark wrote:

On Mon, Mar 26, 2012 at 5:41 PM, Andrew Dunstanand...@dunslane.net
wrote:

Combo CIDs: 755490840 total in 100 blocks; 5161072 free (381
chunks); 750329768 used


I think you'll have to catch Heikki's attention to get a good answer
to this.

Is it possible this job is inserting and then updating (or deleteing)
the row it just inserted and doing a large number of such
insert/update operations all within the same transaction? Or perhaps
it's updating the same row over and over again?


.. and all that in different subtransactions.


sorry, scratch that, they don't need to be in different subtransactions.



It's all in a single transaction. In fact the solution I'm currently 
testing and seems to be working involves breaking it up into batches of 
a few thousand LOs restored per batch.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Odd out of memory problem.

2012-03-26 Thread Tom Lane

Andrew Dunstan and...@dunslane.net writes:
 On 03/26/2012 01:06 PM, Heikki Linnakangas wrote:
 Is it possible this job is inserting and then updating (or deleteing)
 the row it just inserted and doing a large number of such
 insert/update operations all within the same transaction? Or perhaps
 it's updating the same row over and over again?

 It's all in a single transaction. In fact the solution I'm currently 
 testing and seems to be working involves breaking it up into batches of 
 a few thousand LOs restored per batch.

Hm.  The test case is just a straight pg_restore of lots and lots of LOs?
What pg_dump version was the dump made with?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Odd out of memory problem.

2012-03-26 Thread Andrew Dunstan




On 03/26/2012 01:34 PM, Tom Lane wrote:

Andrew Dunstanand...@dunslane.net  writes:

On 03/26/2012 01:06 PM, Heikki Linnakangas wrote:

Is it possible this job is inserting and then updating (or deleteing)
the row it just inserted and doing a large number of such
insert/update operations all within the same transaction? Or perhaps
it's updating the same row over and over again?

It's all in a single transaction. In fact the solution I'm currently
testing and seems to be working involves breaking it up into batches of
a few thousand LOs restored per batch.

Hm.  The test case is just a straight pg_restore of lots and lots of LOs?
What pg_dump version was the dump made with?





8.4.8, same as the target. We get the same issue whether we restore 
direct to the database from pg_restore or via a text dump.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Odd out of memory problem.

2012-03-26 Thread Tom Lane

Andrew Dunstan and...@dunslane.net writes:
 On 03/26/2012 01:34 PM, Tom Lane wrote:
 Hm.  The test case is just a straight pg_restore of lots and lots of LOs?
 What pg_dump version was the dump made with?

 8.4.8, same as the target. We get the same issue whether we restore 
 direct to the database from pg_restore or via a text dump.

I believe I see the issue: when creating/loading LOs, we first do a
lo_create (which in 8.4 makes a page zero tuple in pg_largeobject
containing zero bytes of data) and then lo_write, which will do a
heap_update to overwrite that tuple with data.  This is at the next
command in the same transaction, so the original tuple has to receive a
combo CID.  Net result: we accumulate one new combo CID per large object
loaded in the same transaction.  You can reproduce this without any
pg_dump involvement at all, using something like

create table mylos (id oid);
insert into mylos select lo_import('/tmp/junk') from generate_series(1,100);

The problem is gone in 9.0 and up because now we use a
pg_largeobject_metadata entry instead of a pg_largeobject row to flag
the existence of an empty large object.  I don't see any very practical
backend fix for the problem in 8.x.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Odd out of memory problem.

2012-03-26 Thread Greg Stark

On Mon, Mar 26, 2012 at 6:15 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Greg Stark st...@mit.edu writes:
 I have a sketch for how to handle spilling hash aggregates to disk in
 my head. I'm not sure if it's worth the amount of complexity it would
 require but I'll poke around a bit and see if it works out well.

 It'd be awfully nice if those could spill to disk.  I think that
 currently that's the only plan type where a misestimate can lead to
 hard failure rather than just slower-than-you'd-like.  Which is not
 nice considering that the estimates are necessarily just estimates.

 Could you give us a brain dump on the sketch?  I've never seen how to
 do it without unreasonable overhead.

Hm. So my original plan was dependent on adding the state-merge
function we've talked about in the past. Not all aggregate functions
necessarily can support such a function but I think all or nearly all
the builtin aggregates can. Certainly min,max, count, sum, avg,
stddev, array_agg can which are most of what people do. That would be
a function which can take two state variables and produce a new state
variable.

If you have this then you can sort and spill the hash table and start
a new hash table and keep going. When you're done you merge the hash
tables using something like heap merge but applying the state merge
function.

However now that I've started thinking about it further I think you
could solve it with less complexity by cheating in various ways. For
example if you limit the hash size to 1/2 of work_mem then you when
you reach that limit you could just stuff any tuple that doesn't match
a hash entry into a tuplesort with 1/2 of work_mem and do the regular
level break logic on the output of that.

Or if you can count on tuplesort to be a stable sort (and I think it
might be already) then you could just spill the sorted hash entries
and then switch to doing a tapesort for the rest. When you finish
merging you can read them back and do level break logic like normal.
If there is a partially computed state it will be the first one in the
equal group. If you can't count on tuplesort to be stable you could
dump the partially computed states to a different tape and do a merge
between it and the sorted output of the main data set.

The holy grail of this kind of merging of the two algorithms would be
something that keeps the hash table going and maintains an lru of hash
entries. When it grows too large it would spill the oldest partial
state. Then it would sort those states and merge them (possibly in a
single step).  That might be too complex to pull its weight given some
of the above sketches are probably simple enough.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Odd out of memory problem.

2012-03-26 Thread Tom Lane

Greg Stark st...@mit.edu writes:
 On Mon, Mar 26, 2012 at 6:15 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Could you give us a brain dump on the sketch?  I've never seen how to
 do it without unreasonable overhead.

 Hm. So my original plan was dependent on adding the state-merge
 function we've talked about in the past. Not all aggregate functions
 necessarily can support such a function but I think all or nearly all
 the builtin aggregates can. Certainly min,max, count, sum, avg,
 stddev, array_agg can which are most of what people do. That would be
 a function which can take two state variables and produce a new state
 variable.

I'd rather not invent new requirements for aggregate implementations
if we can avoid it.

 However now that I've started thinking about it further I think you
 could solve it with less complexity by cheating in various ways. For
 example if you limit the hash size to 1/2 of work_mem then you when
 you reach that limit you could just stuff any tuple that doesn't match
 a hash entry into a tuplesort with 1/2 of work_mem and do the regular
 level break logic on the output of that.

Or just start dumping such tuples into a tuplestore, while continuing to
process tuples that match the hashagg entries that are already in
existence.  Once the input is exhausted, read out the hashagg entries we
have, flush the hashagg table, start reading from the tuplestore.
Repeat as needed.

I like this idea because the only thing you give up is predictability of
the order of output of aggregated entries, which is something that a
hashagg isn't guaranteeing anyway.  In particular, we would still have a
guarantee that any one aggregate evaluation processes the matching
tuples in arrival order, which is critical for some aggregates.

The main problem I can see is that if we start to flush after work_mem
is X% full, we're essentially hoping that the state values for the
existing aggregates won't grow by more than 1-X%, which is safe for many
common aggregates but fails for some like array_agg().  Ultimately, for
ones like that, it'd probably be best to never consider hashing at all.
I guess we could invent an unsafe for hash aggregation flag for
aggregates that have unbounded state-size requirements.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Odd out of memory problem.

Re: [HACKERS] Odd out of memory problem.

Re: [HACKERS] Odd out of memory problem.

Re: [HACKERS] Odd out of memory problem.

Re: [HACKERS] Odd out of memory problem.

Re: [HACKERS] Odd out of memory problem.

Re: [HACKERS] Odd out of memory problem.

Re: [HACKERS] Odd out of memory problem.

Re: [HACKERS] Odd out of memory problem.

Re: [HACKERS] Odd out of memory problem.

Re: [HACKERS] Odd out of memory problem.

Re: [HACKERS] Odd out of memory problem.

Re: [HACKERS] Odd out of memory problem.

Re: [HACKERS] Odd out of memory problem.

Re: [HACKERS] Odd out of memory problem.

Re: [HACKERS] Odd out of memory problem.

Re: [HACKERS] Odd out of memory problem.

Re: [HACKERS] Odd out of memory problem.

Re: [HACKERS] Odd out of memory problem.

Re: [HACKERS] Odd out of memory problem.

Re: [HACKERS] Odd out of memory problem.

Re: [HACKERS] Odd out of memory problem.

Re: [HACKERS] Odd out of memory problem.

Re: [HACKERS] Odd out of memory problem.

Re: [HACKERS] Odd out of memory problem.

Re: [HACKERS] Odd out of memory problem.

26 matches

Site Navigation

Mail list logo

Footer information