Re: [HACKERS] Patch for ginCombineData

2015-08-21 Thread Jeff Janes
On Wed, Aug 5, 2015 at 3:17 AM, Robert Abraham 
robert.abraha...@googlemail.com wrote:

 Hello,

 we are using gin indexes on big tables. these tables happen to have
 several billion rows.
 the index creation fails in ginCombineData in src/backend/access/ginbulk.c
 because repalloc is limited to 1 gb.
 this limitation makes no sense in this context (compare comments in
 src/include/utils/memutils.h).
 To overcome this limitation on tables with lots of rows repalloc_huge is
 used.
 The test suite still succeeds on my machine.
 Find the patch attached,


This looks good to me.

One possible issue I see is that if accum.allocatedMemory is only slightly
less than maintenance_work_mem just before we decide to repalloc, then
doing the repalloc_huge could cause us to exceed maintenance_work_mem.
Potentially by a lot.  In the worst case (when all the data we've seen
during this round of accumulation falls into the same key), we would
overrun by a factor of almost 2.  Or almost 3, if you count the time during
the repalloc when the new data has been allocated but the old data not yet
freed.  Perhaps the code here should look at the amount of
maintenance_work_mem left and grow by less than a factor of 2 if it is too
close to going over.

Mitigating that, it won't actually use very much of that memory.  Once
control passes back at the end of this tuple, the caller will then realize
it has overshot the maintenance_work_mem and will flush it all.  I think
most modern OSes will not have a problems caused by allocated but untouched
memory.

This patch doesn't introduce that problem, it just allows it to operate at
a higher absolute size.  So I'm marking this as ready for committer.

Cheers,

Jeff


Re: [HACKERS] Patch for ginCombineData

2015-08-08 Thread Robert Haas
On Wed, Aug 5, 2015 at 6:17 AM, Robert Abraham
robert.abraha...@googlemail.com wrote:
 we are using gin indexes on big tables. these tables happen to have several
 billion rows.
 the index creation fails in ginCombineData in src/backend/access/ginbulk.c
 because repalloc is limited to 1 gb.
 this limitation makes no sense in this context (compare comments in
 src/include/utils/memutils.h).
 To overcome this limitation on tables with lots of rows repalloc_huge is
 used.
 The test suite still succeeds on my machine.
 Find the patch attached,

Since nobody seems ready to act on this patch immediately, I suggest
adding it here so it doesn't get forgotten:

https://commitfest.postgresql.org/action/commitfest_view/open

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Patch for ginCombineData

2015-08-05 Thread Robert Abraham
Hello,

we are using gin indexes on big tables. these tables happen to have several
billion rows.
the index creation fails in ginCombineData in src/backend/access/ginbulk.c
because repalloc is limited to 1 gb.
this limitation makes no sense in this context (compare comments in
src/include/utils/memutils.h).
To overcome this limitation on tables with lots of rows repalloc_huge is
used.
The test suite still succeeds on my machine.
Find the patch attached,

Kind regards,

Robert Abraham
*** a/src/backend/access/gin/ginbulk.c
--- b/src/backend/access/gin/ginbulk.c
***
*** 39,45  ginCombineData(RBNode *existing, const RBNode *newdata, void *arg)
  		accum-allocatedMemory -= GetMemoryChunkSpace(eo-list);
  		eo-maxcount *= 2;
  		eo-list = (ItemPointerData *)
! 			repalloc(eo-list, sizeof(ItemPointerData) * eo-maxcount);
  		accum-allocatedMemory += GetMemoryChunkSpace(eo-list);
  	}
  
--- 39,45 
  		accum-allocatedMemory -= GetMemoryChunkSpace(eo-list);
  		eo-maxcount *= 2;
  		eo-list = (ItemPointerData *)
! 			repalloc_huge(eo-list, sizeof(ItemPointerData) * eo-maxcount);
  		accum-allocatedMemory += GetMemoryChunkSpace(eo-list);
  	}
  

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for ginCombineData

2015-08-05 Thread Alexander Korotkov
Hi!

On Wed, Aug 5, 2015 at 1:17 PM, Robert Abraham 
robert.abraha...@googlemail.com wrote:

 we are using gin indexes on big tables. these tables happen to have
 several billion rows.
 the index creation fails in ginCombineData in src/backend/access/ginbulk.c
 because repalloc is limited to 1 gb.
 this limitation makes no sense in this context (compare comments in
 src/include/utils/memutils.h).
 To overcome this limitation on tables with lots of rows repalloc_huge is
 used.
 The test suite still succeeds on my machine.
 Find the patch attached,


Thank you for notice and for the patch!
You should have maintenance_work_mem  1gb and some very frequent entry so
that it's posting list exceeds 1 gb itself.
These circumstances shouldn't be very rare in modern systems. I think it
could be backpatched.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company