Regarding "sort" ...

After going back to my collection of gratuitously looked-up hostnames, and after organizing the potential spreadsheet so as contain the essential information: hostname, visits-per-domain, domains visited, and target domain, I realized that LibreOffice Calc. cannot handle more than a million rows ... my file containing 1.7 million ... I decided to sort the file, first on
Column 3, then Column 1, and then on Column 2.

Accordingly, I rearranged the columns thusly: $3, $1, $2, $4 and sorted with: "sort -nrk 1,4" where "nr" puts the biggest numbers at the top of the column, but sort evidently did not reach to the third column, resulting in an ordering of only hostname and visits-per-domain.

That was sufficient, because it allowed me to split the file at the 1,000,000 mark, whereupon I completed the sort with Calc. Rows from 1,000,001 on to 700,000 have about one domain visit per hostname; the actual numbering of the secondary portion is 1 through 700,001, of course.

Good thing I'm doing it this way: sorting the secondary list thusly: $2, $1, $3, $4 reveals some eye-popping numbers: The uppermost ~1500 of those rows have visitors making one request per hour up to over three visits per second (upwards of 900,000 in a month) ... the uppermost visits-per-domain were to a meteorological domain, but there were other visitors with less
understandable motives making about a hundred visits per hour.

George Langford

Reply via email to