Re: [PATCHES] [HACKERS] Including Snapshot Info with Indexes

Hannu Krosing Tue, 23 Oct 2007 05:44:39 -0700

Ühel kenal päeval, T, 2007-10-23 kell 13:04, kirjutas Heikki
Linnakangas:
> Gokulakannan Somasundaram wrote:
> > Say, with a normal index, you need to goto the table for checking the
> > snapshot. So you would be loading both the index pages + table pages, in
> > order to satisfy a certain operations. Whereas in thick index you occupy 16
> > bytes per tuple more in order to avoid going to the table. So memory
> > management is again better. But i can run the load test, if that's
> > required.
> 
> Yes, performance testing is required for any performance-related patch.
> 
> Remember that you're competing against DSM. We're going to want some
> kind of a DSM anyway because it allows skipping unmodified parts of the
> heap in vacuum.


I would suggest that you use just an additional heap with decoupled
visibility fields as DSM.

For a large number of usage scenarios this will be highly compressible
and will mostly stay in processor caches .

You can start slow, and have the info duplicated in both main heap and 
visibility heap (aka DSM).

There are several advantages to keeping a separate visibility heap:

1) it is usually higly compressible, at least you can throw away
cmin/cmax quite soon, usually also FREEZE and RLE encode the rest.

2) faster access, more tightly packed data pages.

3) index-only scans

4) superfast VACUUM FREEZE 

5) makes VACUUM faster even for worst cases (interleaving live and dead
tuples)

6) any index scan will be faster due to fetching only visible rows from
main heap.

> >  Even when all the tuples are in memory, index only scans are
> > almost 40-60% faster than the index scans with thin indexes.
> 
> Have you actually benchmarked that? What kind of query was that? I don't
> believe for a second that fetching the heap tuple when the page is in
> memory accounts for 40-60% of the overhead of regular index scans.

It depends heavily on the type of memory (postgresql page or disk cache)
it is in.

I remember doing Slony sobscribes in early days, and the speed
difference on loading a table with active PK index was several times,
depending on shared_buffers setting. 

That was for a table, where both heap and index did fit in the 2G memory
which was available, the difference being only shuffling the pages
between postgresql buffer and linux system cache or not.

> BTW, another issue you'll have to tackle, that a DSM-based patch will
> have to solve as well, is how to return tuples from an index. In b-tree,
> we scan pages page at a time, keeping a list of all tids that match the
> scanquals in BTScanOpaque. If we need to return not only the tids of the
> matching tuples, but the tuples as well, where do we store them? You
> could make a palloc'd copy of them all, but that seems quite expensive.

Have you considered returning them as "already visibility-checked pages"
similar to what views or set-returning functions return ?

-------------------
Hannu





---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

               http://www.postgresql.org/docs/faq

Re: [PATCHES] [HACKERS] Including Snapshot Info with Indexes

Reply via email to