i think there is one more thing which would be really good in GIN and which would solve a ton of issues. atm GIN entries are sorted by item pointer. if we could sort them by a "column" it would fix a couple of real work issues such as ...
SELECT ... FROM foo WHERE "tsearch_query" ORDER BY price DESC LIMIT 10 ... or so. it many cases you want to search for a, say, product and find the cheapest / most expensive one. if the tsearch_query yields a high number of rows (which it often does) the subsequent sort will kill you. many thanks, hans On Feb 6, 2014, at 12:36 PM, Heikki Linnakangas wrote: > While hacking on the GIN patches, I've come up with a few different ideas for > improving performance. It's too late for 9.4, but I'll list them here if > someone wants to work on them later: > > * Represent ItemPointers as uint64's, to speed up comparisons. > ginCompareItemPointers is inlined into only a few instructions, but it's > still more expensive than a single-instruction 64-bit comparison. > ginCompareItemPointers is called very heavily in a GIN scan, so even a small > improvement there would make for a noticeable speedup. It might be an > improvement in code clarity, too. > > * Keep the entry streams of a GinScanKey in a binary heap, to quickly find > the minimum curItem among them. > > I did this in various versions of the fast scan patch, but then I realized > that the straightforward way of doing it is wrong, because a single > GinScanEntry can be part of multiple GinScanKeys. If an entry's curItem is > updated as part of advancing one key, and the entry is in a heap of another > key, updating the curItem can violate the heap property of the other entry's > heap. > > * Build a truth table (or cache) of consistent-function's results, and use > that instead of calling consistent for every item. > > * Deduce AND or OR logic from the consistent function. Or have the opclass > provide a tree of AND/OR/NOT nodes directly, instead of a consistent > function. For example, if the query is "foo & bar", we could avoid calling > consistent function altogether, and only return items that match both. > > * Delay decoding segments during a scan. Currently, we decode all segments of > a posting tree page into a single array at once. But with "fast scan", we > might be able to skip over all entries in some of the segments. So it would > be better to copy the segments into backend-private memory in compressed > format, and decode them one segment at a time (or maybe even one item at a > time), when needed. That would avoid the unnecessary decoding of segments > that can be skipped over, and would also reduce memory usage of a scan. > > I'll add these to the TODO. > > - Heikki > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > -- Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers