Re: [HACKERS] Fast GiST index build - further improvements

2012-02-02 Thread Alexander Korotkov
On Thu, Sep 8, 2011 at 8:35 PM, Heikki Linnakangas 
heikki.linnakan...@enterprisedb.com wrote:

 Now that the main GiST index build patch has been committed, there's a few
 further improvements that could make it much faster still:

 Better management of the buffer pages on disk. At the moment, the
 temporary file is used as a heap of pages belonging to all the buffers in
 random order. I think we could speed up the algorithm considerably by
 reading/writing the buffer pages sequentially. For example, when an
 internal page is split, and all the tuples in its buffer are relocated,
 that would be a great chance to write the new pages of the new buffers in
 sequential order, instead of writing them back to the pages freed up by the
 original buffer, which can be spread all around the temp file. I wonder if
 we could use a separate file for each buffer? Or at least, a separate file
 for all buffers that are larger than, say 100 MB in size.

 Better management of in-memory buffer pages. When we start emptying a
 buffer, we currently flush all the buffer pages in memory to the temporary
 file, to make room for new buffer pages. But that's a waste of time, if
 some of the pages we had in memory belonged to the buffer we're about to
 empty next, or that we empty tuples to. Also, if while emptying a buffer,
 all the tuples go to just one or two lower level buffers, it would be
 beneficial to keep more than one page in-memory for those buffers.

 Buffering leaf pages. This I posted on a separate thread already:
 http://archives.postgresql.**org/message-id/4E5350DB.**
 3060...@enterprisedb.comhttp://archives.postgresql.org/message-id/4e5350db.3060...@enterprisedb.com


 Also, at the moment there is one issue with the algorithm that we have
 glossed over this far: For each buffer, we keep some information in memory,
 in the hash table, and in the auxiliary lists. That means that the amount
 of memory needed for the build scales with the size of the index. If you're
 dealing with very large indexes, hopefully you also have a lot of RAM in
 your system, so I don't think this is a problem in practice. Still, it
 would be nice to do something about that. A straightforward idea would be
 to swap some of the information to disk. Another idea that, simpler to
 implement, would be to completely destroy a buffer, freeing all the memory
 it uses, when it becomes completely empty. Then, if you're about to run out
 of memory (as defined by maintenance_work_mem), you can empty some low
 level buffers to disk to free up some.

Unfortunately, I hadn't enough of time to implement something of this
before 9.2 release. Work on my Phd. thesis and web-projects takes too much
time.

But, I think there is one thing we should fix before 9.2 release. We assume
that gist index build have at least effective_cache_size/4 of cache. This
assumption could easily be false on high concurrency systems. I don't see
the way for convincing estimate here, but we could document this behaviour.
So, users could just tune effective_cache_size for gist index build on high
concurrency.

--
With best regards,
Alexander Korotkov.


[HACKERS] Fast GiST index build - further improvements

2011-09-08 Thread Heikki Linnakangas
Now that the main GiST index build patch has been committed, there's a 
few further improvements that could make it much faster still:


Better management of the buffer pages on disk. At the moment, the 
temporary file is used as a heap of pages belonging to all the buffers 
in random order. I think we could speed up the algorithm considerably by 
reading/writing the buffer pages sequentially. For example, when an 
internal page is split, and all the tuples in its buffer are relocated, 
that would be a great chance to write the new pages of the new buffers 
in sequential order, instead of writing them back to the pages freed up 
by the original buffer, which can be spread all around the temp file. I 
wonder if we could use a separate file for each buffer? Or at least, a 
separate file for all buffers that are larger than, say 100 MB in size.


Better management of in-memory buffer pages. When we start emptying a 
buffer, we currently flush all the buffer pages in memory to the 
temporary file, to make room for new buffer pages. But that's a waste of 
time, if some of the pages we had in memory belonged to the buffer we're 
about to empty next, or that we empty tuples to. Also, if while emptying 
a buffer, all the tuples go to just one or two lower level buffers, it 
would be beneficial to keep more than one page in-memory for those buffers.


Buffering leaf pages. This I posted on a separate thread already: 
http://archives.postgresql.org/message-id/4e5350db.3060...@enterprisedb.com



Also, at the moment there is one issue with the algorithm that we have 
glossed over this far: For each buffer, we keep some information in 
memory, in the hash table, and in the auxiliary lists. That means that 
the amount of memory needed for the build scales with the size of the 
index. If you're dealing with very large indexes, hopefully you also 
have a lot of RAM in your system, so I don't think this is a problem in 
practice. Still, it would be nice to do something about that. A 
straightforward idea would be to swap some of the information to disk. 
Another idea that, simpler to implement, would be to completely destroy 
a buffer, freeing all the memory it uses, when it becomes completely 
empty. Then, if you're about to run out of memory (as defined by 
maintenance_work_mem), you can empty some low level buffers to disk to 
free up some.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers