Hi,

When a query references a normal relation, the RangeTblEntry we
construct sets rte->eref->aliasname to either the user-specified alias
name, if there is one, or otherwise to the table name. But if there's
neither a user-specified alias name nor a table name, then we make up
a name. I have two related questions about this. First, why do we do
it at all? Second, why do we assign the names as we do?

Let me start with the second question. After some surveying of the
source code, here's a (possibly-incomplete) list of eref names that we
fabricate: old, new, *SELECT*, ANY_subquery, *MERGE*, *RESULT*,
*SELECT*, excluded, unnamed_subquery, unnamed_join, *GROUP*, *TLOCRN*,
*TROCRN*, *SELECT* %d where %d is an integer, *VALUES*, xmltable,
json_table. Of these, old, new, and excluded seem to make total sense
-- in certain contexts, the user can actually use those names in SQL
expressions to refer to stuff. But the rest, to my knowledge, can't be
referenced within the query, so I suppose they're just for display
purposes. But that seems like an odd choice, because these artificial
names are (1) not unique, which means that if they are referenced
anywhere such references are ambiguous; (2) not consistent in terms of
capitalization and punctuation, which is questionable for something
whose primary purpose is to inform the user; and (3) sometimes
incomprehensible -- I can make nothing of *TLOCRN* or *TROCRN*, even
after looking at the relevant source code, and I wouldn't know what
the distinction is between *SELECT* and *SELECT* %d without looking at
the source code.

And that brings me to the first question, which is why we're even
making up these names at all (with the exceptions of new, old, and
excluded, which serve a clear purpose). If we needed them to identify
things, that would make sense, but since they're neither unique nor
comprehensible, they're not really any good for that. And it seems
downright confusing at best, and a source of bugs at worst, to mix
together user-supplied names and system-generated names in such a way
that the one can't be easily distinguished from the other. We even
have regression tests verifying that if the user explicitly enters
unnamed_join or unnamed_subquery as an alias name, it doesn't break
anything due to confusion with identical alias names that might be
generated internally, which seems like good evidence for the
proposition that I'm not the first person to worry about this being
bug-prone. Granted, there are some cases where these names make their
way into EXPLAIN output. For example, this query from the regression
tests:

select * from int4_tbl o where (f1, f1) in
  (select f1, generate_series(1,50) / 10 g from int4_tbl i group by f1);

produces EXPLAIN output that includes this:

         ->  Subquery Scan on "ANY_subquery"
               Output: "ANY_subquery".f1, "ANY_subquery".g
               Filter: ("ANY_subquery".f1 = "ANY_subquery".g)

However, that seems to be a minority position. Many of the
system-generated eref names do not appear in EXPLAIN output, at least
not in our regression tests. Furthermore, EXPLAIN itself does
post-processing of the relations that appear in the output to set the
final names that are displaced (see set_rtable_names()), so if we
didn't insert names at parse time, we'd still have an opportunity to
make up something for display purposes. You could argue that the
results would be worse, but the current results aren't especially
amazing so I'm not sure I believe that. Perhaps with some elbow grease
they would even be better.

So overall I'm just confused here. Is this just a bunch of old cruft
that has never been cleaned up or standardized, or does it serve some
valuable purpose that is not obvious to me?

-- 
Robert Haas
EDB: http://www.enterprisedb.com


Reply via email to