On Wed, Jan 15, 2014 at 10:47 AM, Alexander Korotkov <aekorot...@gmail.com>wrote:
> On Wed, Jan 15, 2014 at 5:17 AM, Tomas Vondra <t...@fuzzy.cz> wrote: > >> On 14.1.2014 00:38, Tomas Vondra wrote: >> > On 13.1.2014 18:07, Alexander Korotkov wrote: >> >> On Sat, Jan 11, 2014 at 6:15 AM, Tomas Vondra <t...@fuzzy.cz >> >> <mailto:t...@fuzzy.cz>> wrote: >> >> >> >> On 8.1.2014 22:58, Alexander Korotkov wrote: >> >> > Thanks for reporting. Fixed version is attached. >> >> >> >> I've tried to rerun the 'archie' benchmark with the current patch, >> and >> >> once again I got >> >> >> >> PANIC: could not split GIN page, didn't fit >> >> >> >> I reran it with '--enable-cassert' and with that I got >> >> >> >> TRAP: FailedAssertion("!(ginCompareItemPointers(&items[i - 1], >> >> &items[i]) < 0)", File: "gindatapage.c", Line: >> 149) >> >> LOG: server process (PID 5364) was terminated by signal 6: Aborted >> >> DETAIL: Failed process was running: INSERT INTO messages ... >> >> >> >> so the assert in GinDataLeafPageGetUncompressed fails for some >> reason. >> >> >> >> I can easily reproduce it, but my knowledge in this area is rather >> >> limited so I'm not entirely sure what to look for. >> >> >> >> >> >> I've fixed this bug and many other bug. Now patch passes test suite >> that >> >> I've used earlier. The results are so: >> > >> > OK, it seems the bug is gone. However now there's a memory leak >> > somewhere. I'm loading pgsql mailing list archives (~600k messages) >> > using this script >> > >> > https://bitbucket.org/tvondra/archie/src/1bbeb920/bin/load.py >> > >> > And after loading about 1/5 of the data, all the memory gets filled by >> > the pgsql backends (loading the data in parallel) and the DB gets killed >> > by the OOM killer. >> >> I've spent a fair amount of time trying to locate the memory leak, but >> so far no luck. I'm not sufficiently familiar with the GIN code. >> >> I can however demonstrate that it's there, and I have rather simple test >> case to reproduce it - basically just a CREATE INDEX on a table with ~1M >> email message bodies (in a tsvector column). The data is available here >> (360MB compressed, 1GB raw): >> >> http://www.fuzzy.cz/tmp/message-b.data.gz >> >> Simply create a single-column table, load data and create the index >> >> CREATE TABLE test ( body_tsvector TSVECTOR ); >> COPY test FROM '/tmp/message-b.data'; >> CREATE test_idx ON test USING gin test ( body_tsvector ); >> >> I'm running this on a machine with 8GB of RAM, with these settings >> >> shared_buffers=1GB >> maintenance_work_mem=1GB >> >> According to top, CREATE INDEX from the current HEAD never consumes more >> than ~25% of RAM: >> >> PID USER PR NI VIRT RES SHR %CPU %MEM COMMAND >> 32091 tomas 20 0 2026032 1,817g 1,040g 56,2 23,8 postgres >> >> which is about right, as (shared_buffers + maintenance_work_mem) is >> about 1/4 of RAM. >> >> With the v5 patch version applied, the CREATE INDEX process eventually >> goes crazy and allocates almost all the available memory (but somesimes >> finishes, mostly by pure luck). This is what I was able to get from top >> >> PID USER PR NI VIRT RES SHR S %CPU %MEM COMMAND >> 14090 tomas 20 0 7913820 6,962g 955036 D 4,3 91,1 postgres >> >> while the system was still reasonably responsive. >> > > Thanks a lot for your help! I believe problem is that each decompressed > item pointers array is palloc'd but not freed. I hope to fix it today. > Seems to be fixed in the attached version of patch. Another improvement in this version: only left-most segments and further are re-encoded. Left part of page are left untouched. ------ With best regards, Alexander Korotkov.
gin-packed-postinglists-varbyte6.patch.gz
Description: GNU Zip compressed data
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers