Re: [HACKERS] Small GIN optimizations (after 9.4)

PostgreSQL - Hans-Jürgen Schönig Thu, 06 Feb 2014 08:32:52 -0800

i think there is one more thing which would be really good in GIN and which 
would solve a ton of issues.
atm GIN entries are sorted by item pointer.
if we could sort them by a "column" it would fix a couple of real work issues 
such as ...


        SELECT ... FROM foo WHERE "tsearch_query" ORDER BY price DESC LIMIT 10 

... or so.
it many cases you want to search for a, say, product and find the cheapest / 
most expensive one.
if the tsearch_query yields a high number of rows (which it often does) the 
subsequent sort will kill you.

        many thanks,

                hans


On Feb 6, 2014, at 12:36 PM, Heikki Linnakangas wrote:

> While hacking on the GIN patches, I've come up with a few different ideas for 
> improving performance. It's too late for 9.4, but I'll list them here if 
> someone wants to work on them later:
> 
> * Represent ItemPointers as uint64's, to speed up comparisons. 
> ginCompareItemPointers is inlined into only a few instructions, but it's 
> still more expensive than a single-instruction 64-bit comparison. 
> ginCompareItemPointers is called very heavily in a GIN scan, so even a small 
> improvement there would make for a noticeable speedup. It might be an 
> improvement in code clarity, too.
> 
> * Keep the entry streams of a GinScanKey in a binary heap, to quickly find 
> the minimum curItem among them.
> 
> I did this in various versions of the fast scan patch, but then I realized 
> that the straightforward way of doing it is wrong, because a single 
> GinScanEntry can be part of multiple GinScanKeys. If an entry's curItem is 
> updated as part of advancing one key, and the entry is in a heap of another 
> key, updating the curItem can violate the heap property of the other entry's 
> heap.
> 
> * Build a truth table (or cache) of consistent-function's results, and use 
> that instead of calling consistent for every item.
> 
> * Deduce AND or OR logic from the consistent function. Or have the opclass 
> provide a tree of AND/OR/NOT nodes directly, instead of a consistent 
> function. For example, if the query is "foo & bar", we could avoid calling 
> consistent function altogether, and only return items that match both.
> 
> * Delay decoding segments during a scan. Currently, we decode all segments of 
> a posting tree page into a single array at once. But with "fast scan", we 
> might be able to skip over all entries in some of the segments. So it would 
> be better to copy the segments into backend-private memory in compressed 
> format, and decode them one segment at a time (or maybe even one item at a 
> time), when needed. That would avoid the unnecessary decoding of segments 
> that can be skipped over, and would also reduce memory usage of a scan.
> 
> I'll add these to the TODO.
> 
> - Heikki
> 
> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
> 

--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Small GIN optimizations (after 9.4)

Reply via email to