Re: text/html assumptions, and slurping huge files

Micah Cowan Tue, 31 Jul 2007 18:30:41 -0700

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Matthias Vill wrote:
> I just converted some project and a log-html was created with 6MB in
> size and I agree to you, that this is a rare case and opening this file
> with a browser is no fun, but still I don't like hardcoded sizes.
> Maybe there will be an all-on-one-page-manual for some software which
> exceeds this value or someone has a single-page picture-database-export.
> 
> To me it seems to be more clean to provide either some
> "--parse-limit=xxBytes" option and to implement the parse in a way it
> can't exceed memory.
> 
> I also guess that loading the whole 15M of a max-sized-html at once
> looks ugly in memory an can lead to problems when you have multiple
> wget-processes at once.
> 
> Maybe wget should be optimized for HTML-files with a max-size of 4M and
> afterwards parse chunks.


I agree that it's probably a good idea to move HTML parsing to a model
that doesn't require slurping everything into memory; but in the
meantime I'd like to put some sort of stop-gap solution in place, and
limiting the maximum size seems like a reasonable solution.

I'd kind of like to do some large hard-coded limit, even if it looks
more like 50MB; but there's a good case to be made on a configurable
one, and if everyone is thinking this way, than I'll go with that
instead (still need to answer the question of what is a good default
setting for that, though). The only reason I'm kinda hoping to do
hard-coded, is that I wanted to get rid of this option when we actually
do switch to a not-memory-tied model; but then, perhaps we will want
this option for efficiency reasons as well as memory reasons...

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGr+Iw7M8hyUobTrERCK5vAJwMc317TlE8PpiOPtAvH1Ag8RJMEACdFOUB
xfx9K3XgrS9fbge0Og4Dh4M=
=Br2z
-----END PGP SIGNATURE-----

Re: text/html assumptions, and slurping huge files

Reply via email to