Dear Hackers. Kuroda-san and I are interested in the GIN index and have been testing various things. While testing, we are found a little bug. Some cases, the value of nEntries in the metapage was set to the wrong value.
This is a reproduce of bug situation. =# SET maintenance_work_mem TO '1MB'; =# CREATE TABLE foo(i jsonb); =# INSERT INTO foo(i) select jsonb_build_object('foobar001', i) FROM generate_series(1, 10000) AS i; # Input the same value again. =# INSERT INTO foo(i) select jsonb_build_object('foobar001', i) FROM generate_series(1, 10000) AS i; # Creates GIN Index. =# CREATE INDEX foo_idx ON foo USING gin (i jsonb_ops); =# SELECT * FROM gin_metapage_info(get_raw_page('foo_idx', 0)) WITH (fastupdate=off); -[ RECORD 1 ]----+----------- pending_head | 4294967295 pending_tail | 4294967295 tail_free_size | 0 n_pending_pages | 0 n_pending_tuples | 0 n_total_pages | 74 n_entry_pages | 69 n_data_pages | 4 n_entries | 20004 <--★ version | 2 In this example, the nentries value should be 10001 because the gin index stores duplicate values in one leaf(posting tree or posting list). But, if look at the nentries value of metapage using pageinspect, it is stored as 20004. So, Let's run the vacuum. =# VACUUM foo; =# SELECT * FROM gin_metapage_info(get_raw_page('foo_idx', 0)); -[ RECORD 1 ]----+----------- pending_head | 4294967295 pending_tail | 4294967295 tail_free_size | 0 n_pending_pages | 0 n_pending_tuples | 0 n_total_pages | 74 n_entry_pages | 69 n_data_pages | 4 n_entries | 10001 <--★ version | 2 Ah. Run to the vacuum, nEntries is changing the normal value. There is a problem with the ginEntryInsert function. That calls the table scan when creating the gin index, ginBuildCallback function stores the new heap value inside buildstate struct. And next step, If GinBuildState struct is the size of the memory to be using is equal to or larger than the maintenance_work_mem value, run to input value into the GIN index. This process is a function called ginEnctryInsert. The ginEntryInsert function called at this time determines that a new entry is added and increase the value of nEntries. However currently, ginEntryInsert is first to increase in the value of nEntries, and to determine if there are the same entries in the current GIN index. That causes the bug. The patch is very simple. Fix to increase the value of nEntries only when a non-duplicate GIN index leaf added. This bug detection and code fix worked with Kuroda-san. Best Regards. Moon.
GIN_Metapage_bugfix.patch
Description: Binary data