We wrote:

> I realized that LibreOffice Calc. cannot handle more than a million rows ...

> Spreadsheets are only meant to do computation on little data.
> To store many data, use text files or a database management system.

Starting with 134 sets of recent visitor data, the spreadsheet comes to 1.7
million rows, eventually expanding to about fifteen columns. It's not
unwieldy yet at that point, even with an adults' table and a kids' table.

Your interest & expertise in data mining can best be brought to bear after
the spreadsheet is populated to the extent that I envision. The multi-
address PTR records can have millions of cells for each such name. There
will be patterns in the visits to the domains and their varied subject
matter. Some CIDR address spaces are filled with large numbers of different
multi-address PTR records, which demand the database treatment. Not to
mention the IPv6 data, which have ballooned to about a third of the entire
data set.

The number of ISP's that choose to publish the recent visitor data will
grow exponentially (I hope) and that will make my approach burdensome;
It's already doubled between June 2019 and January 2020.

It's great that sort can be "trained" to do the same kind of multi-stage
sorting task that appears to be built into LibreOffice Calc. By June I'll
be forced to face up to that homework assignment.

George Langford


Reply via email to