Hi, Playing around with my hashtable implementation [1] and using it for bitmapscans/tidbitmap.c I noticed something curious. When using small work_mem (4MB in my case) for large-ish tables (~5GB), the performance tanks. That's primarily caused by some issues in the hashtable code (since fixed), but it made me look more at the relevant tidbitmap code. Namely tbm_lossify():
static void tbm_lossify(TIDBitmap *tbm) { ... pagetable_start_iterate_random(tbm->pagetable, &i); while ((page = pagetable_iterate(tbm->pagetable, &i)) != NULL) { if (page->ischunk) continue; /* already a chunk header */ /* * If the page would become a chunk header, we won't save anything by * converting it to lossy, so skip it. */ if ((page->blockno % PAGES_PER_CHUNK) == 0) continue; /* This does the dirty work ... */ tbm_mark_page_lossy(tbm, page->blockno); if (tbm->nentries <= tbm->maxentries / 2) { /* we have done enough */ break; } } tbm_mark_page_lossy() in turn deletes the individual page entry, and adds one for the chunk (spanning several pages). The relevant part is that the loop stops when enough page ranges have been lossified. This leads to some problems: Because iteration (both in my implementation and dynahash) is essentially in bucket order, the front of the hashtable will be mostly empty, whereas later parts of the hashtable will be full (i.e. a lot of collisions). The reason for that is that we'll find all parts of the hashtable that are uncompressed and "early" in the hashspace, and they'll possibly hash to something later in the table. As the number of entries in the hashtable doesn't increase, we'll never grow the hashtable to decrease the number of collisions. I've seen that cause quite noticeable number of conflicts (which of course are worse in open addressing table, rather than a separately chained one). Secondly it'll lead to pretty random parts of the bitmap being compressed. For performance it'd be good to compress "heavily used" areas of the bitmap, instead of just whatever happens to be early in the hash space. I'm not entirely sure how to resolve that best, but it does seem worth addressing. One thing that might be nice is to record the last N insertions once at 90% full or so, and then lossify in a more targeted manner using those. E.g. lossify block ranges (256 long atm) that contain a lot of individual entries or such. Greetings, Andres Freund [1] http://archives.postgresql.org/message-id/20160727004333.r3e2k2y6fvk2ntup%40alap3.anarazel.de -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers