Hi Joe,

> I'd like to do some analysis on a large text file of invoices - for
> example, grouping and summing. Is this an efficient way to build the list?
> 
> The file has 4 million rows in it. The first several hundred thousand load
> quickly and then I notice the time between my checkpoints taking longer (10
> secs+ per 10,000). I ran it for about 10 minutes and killed it and was
> around 2.4M rows

I think this is simply a memory problem.

Each line results in a list of 4 'pack'ed symbols, i.e. 5 cells plus the
symbols with each at least 1 cell. So 2.4M rows should be at least 2.3
GB on a 64-bit machine.

The interpreter keeps allocating more and more memory, an additional M
on each garbage collection. You can speed that up if you run

   (gc 2300)

in the beginning. This will pre-allocate 2.3 G if you have enough RAM.
If the available RAM is smaller, the process will start to trash pages
and slow down a lot.


In general, I think it is not wise to read so much information into a
flat list.



> I played around with other variations of building a list push1, idx, etc
> and as expected they all had a more significant time growth as the list
> grew.

Yes, your way of using 'make' and 'link' is the best.


> I'm experimenting with loading into a db right now to see if that yields
> better results.

Yes, that's better.

Cheers,
- Alex
-- 
UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe

Reply via email to