On Thu, 2002-04-04 at 14:17, Oleg Bartunov wrote:
Subject: Bidirectional hard joins
>
> Hi,
>
>
> Here we propose some essential improvement of postgreSQL functionality,
> which may provide a great perfomance increase.
>
> 1. Problem
>
> The fastest way to find and fetch a record from a table is to
> perform a SELECT ... WHERE record.id = value.
> Probably, an index scan would be performed for this SELECT.
>
> Such index scan seems to be fast, but there are some cases where it may
> appear too slow. The most evident case is the case of a sub-query, which
> can arise as a result of a join or a nested select statement.
>
> If it were possible to store direct references to database
> records in the tables, joins could be implemented in much more effective
> way.
>
> 2. Possible soultion
>
> Creating a datatype which stores direct reference
> to the record (i.e., physical location of the tuple) is only a part of the
> solution.
The tid type does exaclty what is needed.
>
> When a record that is referenced is updated its physical location can be
> changed, so the references to it should be updated. To make this
> possible, the referenced record should remember all the references to
> itself. Thus, we consider the direct tuple references as bidirectional
> links, or "bidirectional hard joins".
>
> These "hard joins" are in some sense similar to hard links in a
> filesystem (in this analogy, classic joins are like symbolic links).
>
> Philosophically, this means a convergence between indexes and tables: a
> table behaives like an index for an other table.
>
> Obviously, this is is a nonrelational feature, and it requires some
> special SQL syntax. Below we provide some examples for clarification of
> the use of the proposed feature.
Or we can just use tid's and ordinary joins to make it a relational
feature.
IIRC this has been discussed on this list a few months ago. I'm not sure
if bi-directional tid usage was discussed, but I can't see how to use
them efficiently in a non-overwrite storage manager.
...
> 4. Performance
>
> When direct joins are used in select statements, they can strongly
> increase performance.
>
> Let us examine the query plan of the request ("Find all Bob's
> children") from the example above in the present day postgres.
> create table man (id SERIAL,name text);
> create table child (id SERIAL,name text, parent_id int4 references man(id));
> .. populate the tables ... and create indexes...
> explain select child.name from child, man
> where child.parent_id = man.id
> and man.name = 'Bob Scott';
>
> Nested Loop
> -> Index Scan using man_n on man
> -> Index Scan using child_par on child
>
>
>
> In a hypotetical postgres with hard joins it could be:
>
> Nested Loop
> -> Index Scan using man_n on man
> -> Direct retrieval on child
>
> I.e., the for each retrieved "man" record we retrieve the "child" records
> directly using hard join. The real overhead for this operation should be
> neglible in comparison with index scan.
OTOH, if index is in memory and the retrieved tuple is not then the
_speed_difference_ could be neglible.
> Using the hard joins require some additional overhead in updates. In fact,
> after updating the record which takes part in such join, the references
> to this record in the other records should be also updated.
And this should be in a non-overwriting way. If we just do a standard
UPDATE, causing a new heap record to be added this will result in a
circle as then the original records references are not valid anymore and
so also need to be updated and so on ...
> This operation
> is not essentially new for postgres as similar things are done with
> indexes when an indexed record is updated. Hence, the overhead for updates
> is not greater than the overhead for updating indexes.
AFAIK indexes are not "updated" but a new index entry is added as the
old tuple may be still visible to some other transaction.
> 5. Implementation and conclusion
>
> Effective implementing of hard joins requires hard changes to postgres,
> most serious of them probably in the executor, where a new method "fetch
> record by reference" should be added in addition to "index scan" and "seq
> scan". Also the optimizer should be taught to deal with this.
>
> The update support is not so hard as it is similar to the updating of
> indexes.
>
> Though the implementation of such hard joins is really a complicated task,
> the performance it brings should be tremendous, so we consider discussing
> this important.
Depending on usage the performance degradation can also be tremendous,
as a simple update can trigger an avalance of referencing tid updates
...
--------------
Hannu
---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])