Re: [HACKERS] Including Snapshot Info with Indexes

Gokulakannan Somasundaram Fri, 12 Oct 2007 05:00:23 -0700

On 10/12/07, Heikki Linnakangas <[EMAIL PROTECTED]> wrote:
>
> Gokulakannan Somasundaram wrote:
> If records have just been inserted to a block, it is in cache. Therefore
> hitting that block to check visibility isn't going to cost much. There
> might be some middle-ground where a tuple has been inserted a while ago,
> so that the block has already been evicted from cache, but the
> transaction hasn't yet been committed, but that's a pretty narrow use
> case.
>
> Note that we can flag a page in the DSM not only by VACUUM, but by any
> backend as soon as all tuples are visible to everyone. You do have to
> scan the tuple headers on the page to determine that, but that's not so
> much overhead, and it could be offloaded to the bgwriter.



The first step in any database tuning of course is to reduce Random I/Os.
But then the next step is to reduce the logical I/Os,
Whether the I/O happens from the Disk/Memory, we should try to reduce the
access to a shared resource as much as possible.
So even if the newly inserted tuples are in memory / disk, restricting the
access to it would improve the overall performance of the system(That place
can be taken over by other blocks). If we see the overall picture,
scalability of the database gets increased, as we reduce the number of locks
being taken.


Yep, true. A long-running transaction like that is problematic anyway,
> because we can't vacuum away any dead rows generated during that period.


It is not problematic for the Indexes with snapshot. They will be working as
usual. And i think one reason why timestamp based databases got an advantage
over locking based databases is that batch jobs can run together with OLTP
transactions. In order to maintain that advantage in PostgreSQL, Indexes
with snapshot helps.

Why do you think that the DBA needs to register tables to the DSM
> manually? Surely that would happen automatically.


Accepted.

The patch submitted for 8.3 did use a global lock, and a fixed size
> shared memory area, but those were exactly the reasons the patch was
> rejected. It will have to be reworked for 8.4.


Ok, then the best case rework to my understanding would be to include the
bitmap DSM block number  into the locking attributes. But still one DSM
block would be mapping to 8192 * 8 = 65536 Table blocks. So if you have to
add a unregistered chunk of a newly created partition, then any access into
the 65536 blocks will have to get stalled, which may not be acceptable under
the OLTP performance requirements. This becomes a performance overhead for
people maintaining daily partitions, as they create a new table everyday and
the space would be increasing from morning till evening and the same table
would be queried the most.

I'm not convinced that's true. We only need super-exclusive locks on
> index pages for interlocking index and heap accesses with non-MVCC
> snapshots, IOW system tables. And since the lock is only held for a
> short time and infrequently, it hasn't been a problem at all.


For a heap with no indexes, we don't take super-exclusive lock to do Vacuum
on it. We just  need to take Exclusive lock on each block and do the Vacuum
on it. That's because the table contains the necessary visibility
information. But with indexes, we may need to refer to the table in order to
do Vacuum. In the mean-while we don't want any page splits to happen. That's
why we take a super exclusive lock on all the leaf  pages (no body should
even have a pin on one of them Ref : README file in nbtree directory) But if
we have the snapshot info into the indexes, then we can just do a index
scan(similar to the heap scan described before) by taking Exclusive lock on
pages one by one and Vacuuming them.


> ii) The above would mean that index can be in operation while the vacuum
> is
> > happening
>
> Huh? VACUUM hasn't locked out other access since version 7.2!


I might have missed something here. If we need Super-Exclusive lock on all
leaf pages in the index to do the Vacuum(no-one should be even having a PIN
on it), then how are we able to allow index scans while the Vacuum is
happening? In my case, the index will have the snapshot information. so no
super exclusive locks, so other leaf pages can be accessed.
If there is a explanation, it might also be useful to update the README file
in  the nbtree directory


> iii) As we have already stated, it provides a efficient clustering of
> > related data.
>
> Sorry, I missed that part. What's that?


Say i create a index on SalespersonId, Date, Customer Name, Details of the
Transaction on a table. For a query like

select Customer Name, Details of the transaction from table where
salespersonid='xyz' order by date.

The entire query gets satisfied by the Index. We will not goto the table. In
short the index acts like an IOT in oracle/ Clustered indexes in other
databases. The necessary information is obtained from one place, since
snapshot is stored in the index
 It will be very useful, especially when the query is going to return more
records.

The *run-time* complexity of that will only be there for UDF indexes,
> but the *code* complexity will always be there. Sorry, I think the
> probability of a reverse mapping index being accepted to Postgres is
> very close to zero.


I too think that the concept of reverse-mapping index is un-necessary.  To
me, if someone creates a function, he is the author of it and he holds the
responsibility to define proper attributes.  But that's my personal opinion
and i have to think from the community's perspective.

There are only few ways to approach it.

a) We can state clearly that Indexes cannot be created on mutable functions.
If someone creates it, whenever we do the traversal from table to index, we
can have an additional check on the Tuple-Id.  If the tuple-id is not
matched, we can just drop the index(since it has proved to be a mutable one)
and issue a NOTICE. I doubt whether this kind of approach is in the
philosophy of PostgreSQL.

b) We should maintain some reverse mapping. If a reverse mapping index is
complex, we can have a hash-table for reverse mapping. It can take the
Tuple-id as input and store the calculated function values and can retrieve
it based on the tuple-id.

I think this would especially help the performance of HOT,where currently
there is a restriction that the updated tuple should find a space in the
same block. It can work across blocks, if either of the solutions are
accepted.

Expecting your Comments...


Thanks,
Gokul.

Re: [HACKERS] Including Snapshot Info with Indexes

Reply via email to