Re: [HACKERS] GIN improvements part 1: additional information

Heikki Linnakangas Wed, 18 Dec 2013 06:51:38 -0800

On 12/18/2013 01:45 PM, Alexander Korotkov wrote:

On Tue, Dec 17, 2013 at 2:49 AM, Heikki Linnakangas <hlinnakan...@vmware.com

wrote:

On 12/17/2013 12:22 AM, Alexander Korotkov wrote:
  2) Storage would be easily extendable to hold additional information as

well.
Better compression shouldn't block more serious improvements.


I'm not sure I agree with that. For all the cases where you don't care
about additional information - which covers all existing users for example
- reducing disk size is pretty important. How are you planning to store the
additional information, and how does using another encoding gets in the way
of that?


I was planned to store additional information datums between
varbyte-encoded tids. I was expected it would be hard to do with PFOR.
However, I don't see significant problems in your implementation of Simple9
encoding. I'm going to dig deeper in your version of patch.


Ok, thanks.

I had another idea about the page format this morning. Instead of havingthe item-indexes at the end of the page, it would be more flexible tostore a bunch of self-contained posting list "segments" on the page. SoI propose that we get rid of the item-indexes, and instead store a bunchof independent posting lists on the page:


typedef struct
{
    ItemPointerData first;   /* first item in this segment (unpacked) */
    uint16      nwords;      /* number of words that follow */
    uint64      words[1];    /* var length */
} PostingListSegment;

Each segment can be encoded and decoded independently. When searchingfor a particular item (like on insertion), you skip over segments where'first' > the item you're searching for.

This format offers a lot more flexibility compared to the separate itemindexes. First, we don't need to have another fixed sized area on thepage, which simplifies the page format. Second, we can more easilyre-encode only one segment on the page, on insertion or vacuum. Thelatter is particularly important with the Simple-9 encoding, whichoperates one word at a time rather than one item at a time; removing orinserting an item in the middle can require a complete re-encoding ofeverything that follows. Third, when a page is being inserted into andcontains only uncompressed items, you don't waste any space for unuseditem indexes.

While we're at it, I think we should use the above struct in the inlineposting lists stored directly in entry tuples. That wastes a few bytescompared to the current approach in the patch (more alignment, and'words' is redundant with the number of items stored on the tupleheader), but it simplifies the functions handling these lists.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GIN improvements part 1: additional information

Reply via email to