Grep worries me because it selects a lot of names that aren't in the pattern
file
No, it does not. You do not understand what it does and/or do not use it
properly. Beside option -F missing (as I explained in my previous post), a
tab should certainly end every pattern. For instance, line 297123 of
SS.HN-GLU-MB-January2020-PTRs-Rndm.txt is "union". As a consequence, 'grep
-f SS.HN-GLU-MB-January2020-PTRs-Rndm.txt' outputs every line including
"union", i.e.:
$ grep union SS.IPv4-NLU-Joined-HN-GLU-January2020-slash24.PTRs_.Tally_.txt
unallocated.unioncom.net.ua 249
r-r-resale-dba-once-upon-a-child-9148-union.static.fuse.net 4
chaco-credit-union-10m-fuse.static.fuse.net 4
mail.unionbankph.com 3
gw.interunion.ru 2
Including line 297123, 128 lines in SS.HN-GLU-MB-January2020-PTRs-Rndm.txt
include "union":
$ zgrep -c union SS.HN-GLU-MB-January2020-PTRs-Rndm.txt_0.gz
128
The presence of the 127 other lines makes no difference whatsoever in the
output of 'grep -f SS.HN-GLU-MB-January2020-PTRs-Rndm.txt'.
The join, sort, and comm-based scripts all were executed orders of magnitude
faster than the grep script.
The overall run times of is dominated by sort's, which is linearithmic in the
number of lines in the largest of the two files. Because grep must output
the lines in the order of the (potentially infinite) file, its run time grows
with the product of the number of lines in that file and the number of
patterns (for each processed line, all patterns are enumerated): that is much
worse if both files are larges. Also, without -F, grep interprets the
patterns as regular expressions: it is obviously more expensive to match a
regular expression than a fixed string. Finally, grep searches the pattern
in the whole line and not only in one specific field, as join does.