Hi,
I've revisited the idea to somehow use foreign keys to do joins,
in the special but common case when joining on columns that exactly match a
foreign key.
The idea is to add a new ternary operator, which would be allowed only in the
FROM clause.
It would take three operands:
1) referencing_table_alias
2) foreign_key_constraint_name
3) referenced_table_alias
POSSIBLE BENEFITS
* Eliminate risk of joining on the wrong columns
Although probably an uncommon class of bugs, a join can be made on the wrong
columns, which could go undetected if the desired row is included by
coincidence, such as if the test environment might only contain a single row in
some table, and the join condition happened to be always true.
By joining using the foreign key, it can be verified at compile time, that the
referenced_table_alias is actually an alias for the table referenced by the
foreign key. If some other alias would be given, an error would be thrown, to
avoid failure.
* Conciser syntax
In a traditional join, you have to explicitly state all columns for the
referencing and referenced table.
I think writing joins feels like you are repeating the same table aliases and
column names over and over again, all the time.
This is especially true for multiple-column joins.
This is somewhat addressed by the USING join form, but USING has other
drawbacks, why I tend to avoid it except for one-off queries.
When having to use fully-qualified table aliases, that adds even further to the
verboseness.
* Makes abnormal joins stand out
If joining on something else than foreign key columns, or some inequality
expression, such joins will continue to be written in the traditional way, and
will therefore stand out and be more visible, if all other foreign key-based
joins are written using the new syntax.
When reading SQL queries, I think this would be a great improvement, since the
boring normal joins on foreign keys could be given less attention, and focus
could instead be made on making sure you understand the more complex joins.
* Explicit direction of the join
In a traditional join on foreign key columns, it's not possible to derive if
the join is a one-to-many or many-to-one join, by just looking at the SQL code
itself. One must also know/inspect the data model or make assumptions based on
the naming of columns and tables. This is perhaps the least interesting benefit
though, since good naming makes the direction quite obvious anyway. But I think
it at least reduces the total cognitive load of reading a SQL query.
POSSIBLE DRAWBACKS
* Another thing users would have to learn
* Would require changes to the SQL standard, i.e. SQL committee work
* Introduces a hard dependency on foreign keys, they cannot be dropped
SYNTAX
Syntax is hard, but here is a proposal to start the discussion:
from_item join_type from_item WITH
[referencing_table_alias]->[foreign_key_constraint_name] =
[referenced_table_alias] [ AS join_using_alias ]
EXAMPLE
To experiment with the idea, I wanted to find some real-world queries written
by others,
to see how such SQL queries would look like, using traditional joins vs foreign
key joins.
I came up with the idea of searching Github for "LEFT JOIN", since just
searching for "JOIN" would match a lot of non-SQL code as well.
Here is one of the first examples I found, a query below from the Grafana
project [1]
[1]
https://github.com/grafana/grafana/blob/main/pkg/services/accesscontrol/database/resource_permissions.go
SELECT
p.*,
? AS resource_id,
ur.user_id AS user_id,
u.login AS user_login,
u.email AS user_email,
tr.team_id AS team_id,
t.name AS team,
t.email AS team_email,
r.name as role_name
FROM permission p
LEFT JOIN role r ON p.role_id = r.id
LEFT JOIN team_role tr ON r.id = tr.role_id
LEFT JOIN team t ON tr.team_id = t.id
LEFT JOIN user_role ur ON r.id = ur.role_id
LEFT JOIN user u ON ur.user_id = u.id
WHERE p.id = ?
Here is how the FROM clause could be rewritten:
FROM permission p
LEFT JOIN role r WITH p->permission_role_id_fkey = r
LEFT JOIN team_role tr WITH tr->team_role_role_id_fkey = r
LEFT JOIN team t WITH tr->team_role_team_id_fkey = t
LEFT JOIN user_role ur WITH ur->user_role_role_id_fkey = r
LEFT JOIN "user" u WITH ur->user_role_user_id_fkey = u
WHERE p.id = 1;
In PostgreSQL, the foreign keys could also be given shorter names, since they
only need to be unique per table and not per namespace. I think a nice
convention is to give the foreign keys the same name as the referenced table,
except if the same table is referenced multiple times or is self-referenced.
Rewriting our example, using such naming convention for the foreign keys:
FROM permission p
LEFT JOIN role r WITH p->role = r
LEFT JOIN team_role tr WITH tr->role = r
LEFT JOIN team t WITH tr->team = t
LEFT JOIN user_role ur WITH ur->role = r
LEFT JOIN "user" u WITH ur->user = u
WHERE p.id = 1;
A better example to illustrate how conciseness is improved, would be one with
lots of multi-column joins.
Please feel free to share better query examples to evaluate.
I cannot stop thinking about this idea, I really think it would greatly improve
SQL as a language.
Foreign keys feels like such an underused valuable potential resource!
If someone can convince me this is a bad idea, that would help me forget about
all of this,
so I would greatly appreciate your thoughts, no matter how negative or positive.
Thank you for digesting.
/Joel
PS.
To readers who might remember the old flawed version of this new idea:
In the old proposal, the third operand (referenced_table_alias) was missing.
There wasn't a way of specifying what table alias the join was supposed to be
made against.
It was assumed the referenced table was always the one being joined in,
which is not always the case, since the referenced table
might already be in scope, and it's instead the referencing table which is
being joined in.
Another problem with the old idea was you were forced to write the joins in a
the same order
as the foreign keys, which often resulted in an awkward join order.
These two problems have now been solved with this new proposal.
Perhaps new problems have been introduced though?