Regarding "sort" ...
After going back to my collection of gratuitously looked-up hostnames, and
after organizing
the potential spreadsheet so as contain the essential information: hostname,
visits-per-domain,
domains visited, and target domain, I realized that LibreOffice Calc. cannot
handle more than
a million rows ... my file containing 1.7 million ... I decided to sort the
file, first on
Column 3, then Column 1, and then on Column 2.
Accordingly, I rearranged the columns thusly: $3, $1, $2, $4 and sorted with:
"sort -nrk 1,4"
where "nr" puts the biggest numbers at the top of the column, but sort
evidently did not reach
to the third column, resulting in an ordering of only hostname and
visits-per-domain.
That was sufficient, because it allowed me to split the file at the 1,000,000
mark, whereupon
I completed the sort with Calc. Rows from 1,000,001 on to 700,000 have about
one domain visit
per hostname; the actual numbering of the secondary portion is 1 through
700,001, of course.
Good thing I'm doing it this way: sorting the secondary list thusly: $2, $1,
$3, $4 reveals
some eye-popping numbers: The uppermost ~1500 of those rows have visitors
making one request
per hour up to over three visits per second (upwards of 900,000 in a month)
... the uppermost
visits-per-domain were to a meteorological domain, but there were other
visitors with less
understandable motives making about a hundred visits per hour.
George Langford