> One would think that it > shouldn't use that much more memory than the size of the actual > contents you are trying to convert.
One would, but that would be reckoning without the actual design of the parser. There are about 3 copies of everything (source, an abstract parse tree of every page which consumes an obscene amount of memory, and a Palm binary form) in memory by the time it's just about to write the binary file, as well as a number of additional dictionaries of various stuff attached to each node. My decision after wrestling with this last fall is that a re-write of Spider.py would help a lot, but that any further optimization would have to turn it from a single-pass into a multi-pass compiler. > I doubt you will be able to create a Plucker document with that > many files. The largest Plucker document I have created has about > 1700 records and it was a PITA to create ;-) I create every night a PluckerDoc with about 10,000 links at the high-water point, but only about 2,300 actual distinct small HTML pages, each fetched via HTTP from a Web server. It took about 22 minutes last night. Never seen anything particularly painful about it, except for the time it takes. Bill
