On Wed, Jan 15, 2014 at 5:17 AM, Tomas Vondra <t...@fuzzy.cz> wrote:

> On 14.1.2014 00:38, Tomas Vondra wrote:
> > On 13.1.2014 18:07, Alexander Korotkov wrote:
> >> On Sat, Jan 11, 2014 at 6:15 AM, Tomas Vondra <t...@fuzzy.cz
> >> <mailto:t...@fuzzy.cz>> wrote:
> >>
> >>     On 8.1.2014 22:58, Alexander Korotkov wrote:
> >>     > Thanks for reporting. Fixed version is attached.
> >>
> >>     I've tried to rerun the 'archie' benchmark with the current patch,
> and
> >>     once again I got
> >>
> >>        PANIC:  could not split GIN page, didn't fit
> >>
> >>     I reran it with '--enable-cassert' and with that I got
> >>
> >>     TRAP: FailedAssertion("!(ginCompareItemPointers(&items[i - 1],
> >>                        &items[i]) < 0)", File: "gindatapage.c", Line:
> 149)
> >>     LOG:  server process (PID 5364) was terminated by signal 6: Aborted
> >>     DETAIL:  Failed process was running: INSERT INTO messages ...
> >>
> >>     so the assert in GinDataLeafPageGetUncompressed fails for some
> reason.
> >>
> >>     I can easily reproduce it, but my knowledge in this area is rather
> >>     limited so I'm not entirely sure what to look for.
> >>
> >>
> >> I've fixed this bug and many other bug. Now patch passes test suite that
> >> I've used earlier. The results are so:
> >
> > OK, it seems the bug is gone. However now there's a memory leak
> > somewhere. I'm loading pgsql mailing list archives (~600k messages)
> > using this script
> >
> >    https://bitbucket.org/tvondra/archie/src/1bbeb920/bin/load.py
> >
> > And after loading about 1/5 of the data, all the memory gets filled by
> > the pgsql backends (loading the data in parallel) and the DB gets killed
> > by the OOM killer.
>
> I've spent a fair amount of time trying to locate the memory leak, but
> so far no luck. I'm not sufficiently familiar with the GIN code.
>
> I can however demonstrate that it's there, and I have rather simple test
> case to reproduce it - basically just a CREATE INDEX on a table with ~1M
> email message bodies (in a tsvector column). The data is available here
> (360MB compressed, 1GB raw):
>
>    http://www.fuzzy.cz/tmp/message-b.data.gz
>
> Simply create a single-column table, load data and create the index
>
>    CREATE TABLE test ( body_tsvector TSVECTOR );
>    COPY test FROM '/tmp/message-b.data';
>    CREATE test_idx ON test USING gin test ( body_tsvector );
>
> I'm running this on a machine with 8GB of RAM, with these settings
>
>    shared_buffers=1GB
>    maintenance_work_mem=1GB
>
> According to top, CREATE INDEX from the current HEAD never consumes more
> than ~25% of RAM:
>
>     PID USER      PR  NI    VIRT    RES    SHR  %CPU %MEM  COMMAND
>   32091 tomas     20   0 2026032 1,817g 1,040g  56,2 23,8  postgres
>
> which is about right, as (shared_buffers + maintenance_work_mem) is
> about 1/4 of RAM.
>
> With the v5 patch version applied, the CREATE INDEX process eventually
> goes crazy and allocates almost all the available memory (but somesimes
> finishes, mostly by pure luck). This is what I was able to get from top
>
>     PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM  COMMAND
>   14090 tomas     20   0 7913820 6,962g 955036 D   4,3 91,1  postgres
>
> while the system was still reasonably responsive.
>

Thanks a lot for your help! I believe problem is that each decompressed
item pointers array is palloc'd but not freed. I hope to fix it today.

------
With best regards,
Alexander Korotkov.

Reply via email to