Grep worries me because it selects a lot of names that aren't in the pattern file, even while the operation remains orderly and manages to select just a fraction of the target file's entries. I had thought that grep has the advantage of allowing me to identify long PTR records based on permutations of their IPv6 addresses, but such comparisons did not occur in the present set of patterns based on IPv4 addresses, where there weren't any
examples of permutated IPv4 addresses in the target file's PTR's.

Join selected just 141 matches, which were easy to recognize because those matches alone included the data in the pattern file's second column. Comm also selects those 141 matches; and I used join to restore their counts column.

The join, sort, and comm-based scripts all were executed orders of magnitude faster than the grep script.

The original pattern file and a randomized (sort -R) as well as reduced-length (one million+ to 300,000 rows) target file are attached, for which join as well as comm find 35 matches in short order.

Reply via email to