-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 Matthias Vill wrote: > I just converted some project and a log-html was created with 6MB in > size and I agree to you, that this is a rare case and opening this file > with a browser is no fun, but still I don't like hardcoded sizes. > Maybe there will be an all-on-one-page-manual for some software which > exceeds this value or someone has a single-page picture-database-export. > > To me it seems to be more clean to provide either some > "--parse-limit=xxBytes" option and to implement the parse in a way it > can't exceed memory. > > I also guess that loading the whole 15M of a max-sized-html at once > looks ugly in memory an can lead to problems when you have multiple > wget-processes at once. > > Maybe wget should be optimized for HTML-files with a max-size of 4M and > afterwards parse chunks.
I agree that it's probably a good idea to move HTML parsing to a model that doesn't require slurping everything into memory; but in the meantime I'd like to put some sort of stop-gap solution in place, and limiting the maximum size seems like a reasonable solution. I'd kind of like to do some large hard-coded limit, even if it looks more like 50MB; but there's a good case to be made on a configurable one, and if everyone is thinking this way, than I'll go with that instead (still need to answer the question of what is a good default setting for that, though). The only reason I'm kinda hoping to do hard-coded, is that I wanted to get rid of this option when we actually do switch to a not-memory-tied model; but then, perhaps we will want this option for efficiency reasons as well as memory reasons... - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer... http://micah.cowan.name/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGr+Iw7M8hyUobTrERCK5vAJwMc317TlE8PpiOPtAvH1Ag8RJMEACdFOUB xfx9K3XgrS9fbge0Og4Dh4M= =Br2z -----END PGP SIGNATURE-----