Re: [HACKERS] GIN improvements part 3: ordering in index

Heikki Linnakangas Tue, 25 Jun 2013 08:32:49 -0700

On 25.06.2013 01:24, Alexander Korotkov wrote:

On Wed, Jun 19, 2013 at 1:21 AM, Alexander Korotkov<[email protected]>wrote:

On Mon, Jun 17, 2013 at 10:27 PM, Heikki Linnakangas<
[email protected]>  wrote:

That has some obvious limitations. First of all, you can run out of

memory.


Yes, it is so. qsort should be replaced with tuplesort.


In attached patch qsort is replaced with tuplesort. As expected, it leads
to some performance drawback, but it's not dramatic.
Also, some doc is added for new distance method of GIN.

Thanks. But the fact remains that with this patch, you fetch all thetuples and then you sort them, which is exactly the same thing you dowithout the patch. The way it happens without the patch just seems to beslower. Time to break out the profiler..

I downloaded what I believe to be the same DBLP titles dataset that youtested with. Without the patch:


postgres=# explain analyze select * from dblp_titles where tsvector @@

to_tsquery('english', 'statistics') order by ts_rank(tsvector,to_tsquery('english', 'statistics')) limit 10;QUERYPLAN


---------------------------------------------------------------------------------
---------------------------------------------------------

Limit (cost=42935.81..42935.84 rows=10 width=64) (actualtime=57.945..57.948 ro

ws=10 loops=1)

-> Sort (cost=42935.81..42980.86 rows=18020 width=64) (actualtime=57.943..5

7.944 rows=10 loops=1)
         Sort Key: (ts_rank(tsvector, '''statist'''::tsquery))
         Sort Method: top-N heapsort  Memory: 28kB

-> Bitmap Heap Scan on dblp_titles (cost=211.66..42546.41rows=18020 w

idth=64) (actual time=13.061..46.358 rows=15142 loops=1)
               Recheck Cond: (tsvector @@ '''statist'''::tsquery)

-> Bitmap Index Scan on gin_title (cost=0.00..207.15rows=18020

width=0) (actual time=8.339..8.339 rows=15142 loops=1)
                     Index Cond: (tsvector @@ '''statist'''::tsquery)
 Total runtime: 57.999 ms
(9 rows)

And the profile looks like this:

6,94%  postgres            tbm_iterate
6,12%  postgres            hash_search_with_hash_value
4,40%  postgres            tbm_comparator
3,79%  libc-2.17.so        __memcpy_ssse3_back
3,68%  postgres            heap_hot_search_buffer
2,62%  postgres            slot_deform_tuple
2,47%  postgres            nocachegetattr
2,37%  postgres            heap_page_prune_opt
2,27%  libc-2.17.so        __memcmp_sse4_1
2,21%  postgres            heap_fill_tuple
2,18%  postgres            pg_qsort
1,96%  postgres            tas
1,88%  postgres            palloc0
1,83%  postgres            calc_rank_or

Drilling into that, tbm_iterate, tbm_comparator and pq_sort calls comefrom the Tidbitmap code, as well as about 1/3 of thehash_search_with_hash_value calls. There seems to be a fair amount ofoverhead in building and iterating the tid bitmap. Is that what'skilling us?


For comparison, this is the same with your patch:

postgres=# explain analyze select * from dblp_titles where tsvector @@
to_tsquery('english', 'statistics') order by tsvector ><
to_tsquery('english', 'statistics') limit 10;

QUERYPLAN


---------------------------------------------------------------------------------
--------------------------------------------------------

Limit (cost=16.00..52.81 rows=10 width=136) (actual time=9.957..9.980rows=10 l

oops=1)

-> Index Scan using gin_title on dblp_titles (cost=16.00..52198.94rows=1417

5 width=136) (actual time=9.955..9.977 rows=10 loops=1)
         Index Cond: (tsvector @@ '''statist'''::tsquery)
         Order By: (tsvector >< '''statist'''::tsquery)
 Total runtime: 10.084 ms
(5 rows)

9,57%  postgres            scanGetItemFast
7,02%  postgres            calc_rank_or
5,71%  postgres            FunctionCall10Coll
5,59%  postgres            entryGetNextItem
5,19%  postgres            keyGetOrdering
5,13%  postgres            ginDataPageLeafReadItemPointer
4,89%  postgres            entryShift
4,85%  postgres            ginCompareItemPointers
3,44%  postgres            ginDataPageLeafRead
3,28%  postgres            AllocSetAlloc
3,27%  postgres            insertScanItem
3,18%  postgres            gin_tsquery_distance
2,38%  postgres            puttuple_common
2,26%  postgres            checkcondition_gin
2,20%  postgres            cmpEntries
2,17%  postgres            AllocSetFreeIndex
2,11%  postgres            calc_rank

Unsurprisingly, the tidbitmap overhead is not visible in the profilewith your patch.

To see how much of the difference is caused by the tidbitmap overhead, Iwrote the attached quick & dirty patch (it will crash and burn withqueries that require tidbitmap unions or intersects etc.). When thereare fewer than 10 items on a page, the tidbitmap keeps the offsets ofthose items in an ordered array of offsets, instead of setting the bitsin the bitmap. In the test query, most pages have only 1, maybe 2 hits,so that's a lot cheaper. Here's what the results look like with that patch:


postgres=# explain analyze select * from dblp_titles where tsvector @@

to_tsquery('english', 'statistics') order by ts_rank(tsvector,to_tsquery('english', 'statistics')) limit 10;QUERYPLAN


---------------------------------------------------------------------------------
-------------------------------------------------------

Limit (cost=36396.80..36396.82 rows=10 width=136) (actualtime=19.532..19.532 r

ows=0 loops=1)

-> Sort (cost=36396.80..36432.23 rows=14175 width=136) (actualtime=19.531..

19.531 rows=0 loops=1)
         Sort Key: (ts_rank(tsvector, '''statist'''::tsquery))
         Sort Method: quicksort  Memory: 25kB

-> Bitmap Heap Scan on dblp_titles (cost=169.86..36090.48rows=14175 w

idth=136) (actual time=19.525..19.525 rows=0 loops=1)
               Recheck Cond: (tsvector @@ '''statist'''::tsquery)

-> Bitmap Index Scan on gin_title (cost=0.00..166.31rows=14175

width=0) (actual time=8.310..8.310 rows=15142 loops=1)
                     Index Cond: (tsvector @@ '''statist'''::tsquery)
 Total runtime: 19.583 ms
(9 rows)

So, that's about 3x faster than with no patch, but your patch is still2x faster than this. However, I believe there's still room forimprovement with this approach. The profile with this quick & dirtypatch looks like this:


13,01%  postgres            hash_search_with_hash_value
11,03%  postgres            tbm_comparator
 7,10%  postgres            heap_page_prune_opt
 6,60%  postgres            pg_qsort
 4,47%  postgres            expand_table
 4,06%  postgres            scanGetItemFast
 3,82%  postgres            tas
 3,40%  postgres            tbm_iterate
 2,70%  postgres            PinBuffer
 2,30%  postgres            hash_any
 2,16%  postgres            hash_seq_search
 2,07%  postgres            tbm_get_pageentry
 1,86%  postgres            calc_bucket

So, a lot of time is still spent in tidbitmap. The overhead oftbm_iterate is gone, but much overhead remains elsewhere. The crux ofthe overhead is that when you have only 1-2 hits per page, the tidbitmapis really expensive, as it allocates an PagetableEntry struct for eachpage. I believe it would be much cheaper to just accumulate theItemPointers into a simple array in tbm_add_tuples, sort it once, andreturn results.

In summary: The test case you presented as motivation for this patch isa bit of a worst-case scenario for the current tidbitmap implementation.The speedup from your patch comes from avoiding the tidbitmap. However,it would be fairly easy to optimize the tidbitmap to handle thisscenario better, which would benefit all kinds of queries that usebitmap scans. There is really no reason to complicate the GIN API forthis. Let's just optimize tidbitmap.

I'm not sure if I fullly understand your patch, though. Is there someother test scenario where it performs significantly better, which cannot be attributed to a tidbitmap overhead? I'm assuming 'no' for now,and marking this patch as rejected in the commitfest app, but feel freeto reopen if there is.


- Heikki

diff --git a/src/backend/nodes/tidbitmap.c b/src/backend/nodes/tidbitmap.c
index 3ef0112..ddb6ee5 100644
--- a/src/backend/nodes/tidbitmap.c
+++ b/src/backend/nodes/tidbitmap.c
@@ -80,6 +80,8 @@
 /* number of active words for a lossy chunk: */
 #define WORDS_PER_CHUNK  ((PAGES_PER_CHUNK - 1) / BITS_PER_BITMAPWORD + 1)
 
+#define MAXOFFSETS		10
+
 /*
  * The hashtable entries are represented by this data structure.  For
  * an exact page, blockno is the page number and bit k of the bitmap
@@ -99,9 +101,17 @@ typedef struct PagetableEntry
 	BlockNumber blockno;		/* page number (hashtable key) */
 	bool		ischunk;		/* T = lossy storage, F = exact */
 	bool		recheck;		/* should the tuples be rechecked? */
-	bitmapword	words[Max(WORDS_PER_PAGE, WORDS_PER_CHUNK)];
+	int8		noffsets;		/* number of offsets in boffsets, or -1 if bitmap */
+	union
+	{
+		bitmapword	bwords[Max(WORDS_PER_PAGE, WORDS_PER_CHUNK)];
+		OffsetNumber boffsets[MAXOFFSETS];
+	} wo;
 } PagetableEntry;
 
+#define words wo.bwords
+#define boffsets wo.boffsets
+
 /*
  * dynahash.c is optimized for relatively large, long-lived hash tables.
  * This is not ideal for TIDBitMap, particularly when we are using a bitmap
@@ -244,6 +254,27 @@ tbm_create_pagetable(TIDBitmap *tbm)
 	tbm->status = TBM_HASH;
 }
 
+static void
+tbm_page_to_bitmap(PagetableEntry *page)
+{
+	int i;
+	OffsetNumber offsets[MAXOFFSETS];
+	int noffsets;
+
+	noffsets = page->noffsets;
+	memcpy(offsets, page->boffsets, sizeof(OffsetNumber) * noffsets);
+	memset(page->boffsets, 0, sizeof(page->boffsets));
+	for (i = 0; i < noffsets; i++)
+	{
+		OffsetNumber off = offsets[i];
+		int wordnum = WORDNUM(off - 1);
+		int bitnum = BITNUM(off - 1);
+		page->words[wordnum] |= ((bitmapword) 1 << bitnum);
+	}
+
+	page->noffsets = -1;
+}
+
 /*
  * tbm_free - free a TIDBitmap
  */
@@ -270,13 +301,14 @@ tbm_add_tuples(TIDBitmap *tbm, const ItemPointer tids, int ntids,
 			   bool recheck)
 {
 	int			i;
+	BlockNumber prevblk = InvalidBlockNumber;
+	PagetableEntry *page = NULL;
 
 	Assert(!tbm->iterating);
 	for (i = 0; i < ntids; i++)
 	{
 		BlockNumber blk = ItemPointerGetBlockNumber(tids + i);
 		OffsetNumber off = ItemPointerGetOffsetNumber(tids + i);
-		PagetableEntry *page;
 		int			wordnum,
 					bitnum;
 
@@ -287,20 +319,52 @@ tbm_add_tuples(TIDBitmap *tbm, const ItemPointer tids, int ntids,
 		if (tbm_page_is_lossy(tbm, blk))
 			continue;			/* whole page is already marked */
 
-		page = tbm_get_pageentry(tbm, blk);
+		if (prevblk != blk)
+			page = tbm_get_pageentry(tbm, blk);
 
 		if (page->ischunk)
 		{
 			/* The page is a lossy chunk header, set bit for itself */
 			wordnum = bitnum = 0;
+			page->words[0] |= 1;
 		}
 		else
 		{
 			/* Page is exact, so set bit for individual tuple */
-			wordnum = WORDNUM(off - 1);
-			bitnum = BITNUM(off - 1);
+			if (page->noffsets >= MAXOFFSETS)
+			{
+				/* switch to bitmap */
+				tbm_page_to_bitmap(page);
+			}
+			if (page->noffsets >= 0)
+			{
+				int j;
+				/* Find the right place for the new offset */
+				for (j = 0; j < page->noffsets; j++)
+				{
+					if (page->boffsets[j] == off)
+						return;
+					if (page->boffsets[j] < off)
+					{
+						for (; j < page->noffsets; j++)
+						{
+							OffsetNumber swp = off;
+							off = page->boffsets[j];
+							page->boffsets[j] = swp;
+						}
+						page->boffsets[j] = off;
+						page->noffsets++;
+						break;
+					}
+				}
+			}
+			else
+			{
+				wordnum = WORDNUM(off - 1);
+				bitnum = BITNUM(off - 1);
+				page->words[wordnum] |= ((bitmapword) 1 << bitnum);
+			}
 		}
-		page->words[wordnum] |= ((bitmapword) 1 << bitnum);
 		page->recheck |= recheck;
 
 		if (tbm->nentries > tbm->maxentries)
@@ -711,21 +775,29 @@ tbm_iterate(TBMIterator *iterator)
 			page = tbm->spages[iterator->spageptr];
 
 		/* scan bitmap to extract individual offset numbers */
-		ntuples = 0;
-		for (wordnum = 0; wordnum < WORDS_PER_PAGE; wordnum++)
+		if (page->noffsets >= 0)
 		{
-			bitmapword	w = page->words[wordnum];
-
-			if (w != 0)
+			ntuples = page->noffsets;
+			memcpy(output->offsets, page->boffsets, sizeof(OffsetNumber) * page->noffsets);
+		}
+		else
+		{
+			ntuples = 0;
+			for (wordnum = 0; wordnum < WORDS_PER_PAGE; wordnum++)
 			{
-				int			off = wordnum * BITS_PER_BITMAPWORD + 1;
+				bitmapword	w = page->words[wordnum];
 
-				while (w != 0)
+				if (w != 0)
 				{
-					if (w & 1)
-						output->offsets[ntuples++] = (OffsetNumber) off;
-					off++;
-					w >>= 1;
+					int			off = wordnum * BITS_PER_BITMAPWORD + 1;
+
+					while (w != 0)
+					{
+						if (w & 1)
+							output->offsets[ntuples++] = (OffsetNumber) off;
+						off++;
+						w >>= 1;
+					}
 				}
 			}
 		}

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GIN improvements part 3: ordering in index

Reply via email to