> I'd like to do some analysis on a large text file of invoices - for
> example, grouping and summing. Is this an efficient way to build the list?
> The file has 4 million rows in it. The first several hundred thousand load
> quickly and then I notice the time between my checkpoints taking longer (10
> secs+ per 10,000). I ran it for about 10 minutes and killed it and was
> around 2.4M rows
I think this is simply a memory problem.
Each line results in a list of 4 'pack'ed symbols, i.e. 5 cells plus the
symbols with each at least 1 cell. So 2.4M rows should be at least 2.3
GB on a 64-bit machine.
The interpreter keeps allocating more and more memory, an additional M
on each garbage collection. You can speed that up if you run
in the beginning. This will pre-allocate 2.3 G if you have enough RAM.
If the available RAM is smaller, the process will start to trash pages
and slow down a lot.
In general, I think it is not wise to read so much information into a
> I played around with other variations of building a list push1, idx, etc
> and as expected they all had a more significant time growth as the list
Yes, your way of using 'make' and 'link' is the best.
> I'm experimenting with loading into a db right now to see if that yields
> better results.
Yes, that's better.