In reviewing another patch, I noticed that the documentation had an xref to
a fairly large page of documentation (create_table.sgml), and I wondered if
that link was chosen because the original author genuinely felt the entire
page was relevant, or merely because a more granular link did not exist at
the time, and this link had been carried forward since then while the
referenced page grew in complexity.

In the interest of narrowing the problem down to a manageable size, I wrote
a script (attached) to find all xrefs and rank them by criteria[1] that I
believe hints at the possibility that the xrefs should be more granular
than they are.

I intend to use the script output below as a guide for manually reviewing
the references and seeing if there are opportunities to guide the reader to
the relevant section of those pages.

In case anyone is curious, here is a top excerpt of the script output:

file_name                          link_name                     link_count
 line_count  num_refentries
---------------------------------  ----------------------------  ----------
 ----------  --------------
ref/psql-ref.sgml                  app-psql                      20
 5215        1
ecpg.sgml                          ecpg-sql-allocate-descriptor  4
  10101       17
ref/create_table.sgml              sql-createtable               23
 2437        1
ref/select.sgml                    sql-select                    23
 2207        1
ref/create_function.sgml           sql-createfunction            30
 935         1
ref/alter_table.sgml               sql-altertable                12
 1776        1
ref/pg_dump.sgml                   app-pgdump                    11
 1545        1
ref/pg_basebackup.sgml             app-pgbasebackup              11
 1008        1
ref/create_type.sgml               sql-createtype                10
 1029        1
ref/create_index.sgml              sql-createindex               9
  999         1
ref/postgres-ref.sgml              app-postgres                  10
 845         1
ref/copy.sgml                      sql-copy                      7
  1081        1
ref/create_role.sgml               sql-createrole                13
 511         1
ref/grant.sgml                     sql-grant                     13
 507         1
ref/create_foreign_table.sgml      sql-createforeigntable        14
 455         1
ref/insert.sgml                    sql-insert                    8
  792         1
ref/pg_ctl-ref.sgml                app-pg-ctl                    8
  713         1
ref/create_trigger.sgml            sql-createtrigger             7
  777         1
ref/set.sgml                       sql-set                       15
 332         1
ref/create_aggregate.sgml          sql-createaggregate           6
  805         1
ref/initdb.sgml                    app-initdb                    8
  588         1
ref/create_policy.sgml             sql-createpolicy              7
  655         1
dblink.sgml                        contrib-dblink-connect        1
  2136        19
ref/create_subscription.sgml       sql-createsubscription        9
  472         1

Some of these will clearly be false positives. For instance, dblink.sgml
and ecpg.sgml have a lot of refentries, but they seem to lack a global
"top" refentry which I assumed would be there.

On the other hand, I have to wonder if the references to psql might be to a
specific feature of the tool, and perhaps we can create refentries to those.

[1] The criteria is: must be first refentry in file, file must be at least
200 lines long, then rank by lines*references, 2x for referencing the top
refentry when others exist

Attachment: xref-analysis.sh
Description: application/shellscript

Reply via email to